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A major contribution to system outage in a terrestrial digital radio 
channel is deep fading of the frequency transfer characteristic, which 
in addition to causing a precipitous drop in received signal-to-noise 
ratio (s/n) also causes signal dispersion that can result in severe 
intersymbol interference. Because the temporal variation of the chan- 
nel is slow compared to the signaling rate, the information theoretic 
channel capacity and the “Efficiency Index” in bits/cycle—a figure- 
of-merit we use for the communication techniques considered—can 
be viewed as random processes. Starting from an established math- 
ematical model characterizing fading channels (derived from exten- 
sive measurements), we estimate the probability distribution of chan- 
nel capacity and the distributions of efficiency indices for different 
communications techniques. The repertoire of communication meth- 
ods considered involves quadrature amplitude modulation with 
adaptive linear and decision feedback equalization, and maximum 
likelihood sequence estimation. For specific outage objectives the 
maximum number of bits per cycle achievable by each technique is 
estimated. The sensitivity of the distributions to bit-error-rate objec- 
tive and unfaded s/n is assessed. For certain desired operating points 
the efficacy of adaptive equalization is demonstrated. There are some 
operating points where adaptive equalization alone is not adequate 
and therefore space diversity should be considered. An estimate of 
the effect of frequency diversity is also included. 


42S 


I. INTRODUCTION AND SUMMARY 


Fading of terrestrial digital radio channels owing to multipath re- 
ception is a prime cause of system outage. For a specific hop a 
mathematical model of these fades has been developed by W. D. 
Rummler’” from extensive measurements of the channel frequency 
power transfer characteristic over time. The radio channel has a time- 
varying frequency characteristic, with additive Gaussian noise; how- 
ever, the temporal variations are sufficiently slow in comparison to the 
data symbol rate that the characteristics can be represented as a 
random ensemble of static frequency power transfer functions. The 
presence of additive noise implies that each member of the ensemble 
is limited to a maximum rate of transmission of data, depending on 
the communication method. For each specific communication tech- 
nique, the stochastic nature of the channel makes it meaningful to 
consider the probability distribution of data rates that can be sup- 
ported at a certain bit-error-rate objective. 

The purpose of this paper is to explore the relative performance of 
various communication techniques employing quadrature amplitude 
modulation (QAM), distinguished by the type of equalization method 
used. These techniques include variants of linear equalization, decision 
feedback equalization, and maximum likelihood sequence estimation 
(MLSE). For these methods a unified set of Chernoff bounds on the 
probability of error is obtained. Given a communication method, a 
channel impulse response, an error-rate objective, a received unfaded 
channel s/n, a channel bandwidth, and a signaling rate we use the 
Chernoff bounds to estimate the maximum number of bits per cycle of 
bandwidth (not necessarily integer-valued) for which the constraints 
are met. By computing the maximum number of bits per cycle sup- 
ported by each member of a large representative population of chan- 
nels, we obtain the cumulative probability distribution function. One 
can use the cumulative distribution curve to determine the probability 
of outage at a prescribed bit rate. 

The information theory bound on the number of bits per cycle 
attainable is also derived. In a sequence of plots we compare the 
different schemes with each other and with the information theory 
limit. 

If F(r) is the probability distribution function of data rates associ- 
ated with a communication method and we set an outage objective, é, 
then the value r; for which F(r;) = represents the maximum data 
rate at which it is possible to transmit and meet the outage objective. 
We present and discuss these distributions in the context of desired 
long-haul and short-haul outage objectives and rates associated with 
the digital hierarchy constraints. The efficacy of adaptive equalization 
is established. The advantage of decision feedback and MLSE over 
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optimum linear equalization is not very substantial. There are some 
desired operating points for which space diversity should be consid- 
ered. 

For a fair comparison of different communication techniques, the 
transmitter filter shape must be optimized for a fixed transmitter 
power. We found the performance to be insensitive to whether or not 
the transmitter filter is optimized and we provide a theoretical guide- 
line to indicate when this optimization becomes significant. 

Our results indicate that optimized equalizer structures yield data 
rates only a few bits/cycle below channel capacity. It therefore appears 
that higher dimensional constellations® spanning two to four symbol 
intervals could go a long way toward obtaining that which can be 
expected practically. Although we did not analyze higher dimensional 
signal design or optimize the constellation in QAM, it is reasonable to 
expect that these techniques can offer at most an equivalent few dB 
increase in s/n. Another method of achieving coding gain “of the order 
of 3-4 dB” is described in Ref. 4. Moreover, the real limitations on the 
selection of signal points in a practical system will most likely arise 
from the nonlinear operation of radio frequency (RF) power amplifiers 
rather than from s/n limitations. 

We argue the merits of adaptive transversal equalization and provide 
numerical support for our claims. This is not to say that fixed or even 
adjustable bump and/or slope equalizers in the frequency domain 
could not provide adequate performance in some cases. However, 
fluctuating (and sometimes nonminimum) phase distortion associated 
with fading and other linear filters admits robust and stable compen- 
sation via adaptive transversal filters. These structures with adjustable 
taps can automatically equalize any phase characteristic without noise 
enhancement and therefore are natural candidates in these applica- 
tions, especially at a high number of levels where even small amounts 
of phase distortion can degrade system performance. 

Our analysis was carried out with ideal models and an infinite 
number of taps. The actual number of taps needed in any application 
would be determined from experiments and/or more detailed analysis. 


ll. THE EQUALIZED QAM SYSTEM—IDEALIZED MODEL 


The use of equalizers to mitigate the effects of intersymbol interfer- 
ence and noise in voiceband data transmission is by now standard 
practice. We are thus naturally led to consider the application of these 
techniques in digital data transmission over the radio channel where 
slowly varying, frequency-selective fading is the predominant impair- 
ment. Here we review and derive the applicable mathematical theory 
that will be used in the sequel to evaluate the system performance 
indices. 
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To focus on basics and avoid extensive numerical analysis, we 
consider idealized equalizer models represented as transversal filters 
with an infinite number of taps. Tap adjustment algorithms are well 
established and our formulas are derived under the assumption that 
the taps have converged to their optimum values. 

Our analyses are based on the digital communications model de- 
picted in Fig. 1. To appreciate the applicability and generality of this 
baseband model to digital radio communications, we observe that, for 
any bandpass linear channel, the output waveform, when the input is 
any linearly modulated data wave, can be represented as 


s(t) = Re 5 G,h(t — nT + to)exp[i(27fot + ont, 


where Re{-} stands for the “real part.” The data symbols {d@,} trans- 
mitted at T-second intervals, are statistically independent and, in two- 
dimensional modulation systems such as QAM, they assume complex 
values. The overall equivalent baseband impulse response, /(t), is also 
complex-valued. The real part represents the in-phase response, while 
the imaginary part is the quadrature component. The frequency, /fo, is 
the carrier frequency, 6 is the carrier phase, and, fp is the timing phase. 
Ideal demodulation with a known carrier frequency fo and carrier 
phase @ implies a translation of the received bandpass signal to base- 
band. The real part of the resulting complex signal represents the in- 
phase modulation, while the imaginary part is the quadrature modu- 
lation. This then is the rationale, in addition to economics of notation, 
for using the complex baseband model depicted in Fig. 1. 

We restrict our treatment to ideal Nyquist systems with no excess 
bandwidth. This permits less cumbersome calculations without loss of 
physical insights. We also derive our formulas by assuming flat trans- 
mitting filters and prove later that in-band optimum shaping yields 
imperceptible additional benefits. Also neglected is adjacent channel 
interference, as ideal bandlimiting eliminates this problem. 
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Fig. 1—~Complex baseband model for QAM data transmission. 
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We now return to Fig. 1 and discuss the various functions and 
notations indicated. Without loss of generality we assume that the 
complex data symbols, {d@, = a, + 1b,}"., take on values on a set of 
positive and negative odd integers with equal probability. Accordingly, 


L?-1 
Ea,=0 and E|d,|*=2———, 





where E(-) denotes mathematical expectation and L (even) is the 
maximum number of data levels assumed by a, and 6,. Thus, in QAM 
L? data points are available for conveying information and the source 
therefore generates 


pe logeL : 





bits/sec. (1) 


For a given channel bandwidth, W; the efficiency index is defined as 


2 logeL 
WT 


As we shall see, the relationship among P-- probability of error, s/n, 
W, T and H (w)-channel frequency characteristics is rather complicated. 
The determination of the relationship for different communication 
techniques is our chief task in the sequel. 

From a mathematical point of view, the fading radio channel is 
characterized by a slowly varying linear distorting filter whose base- 
band equivalent complex impulse response is the Fourier transform of 
the transfer function H (w), shifted to zero frequency: 

2rWw 
~ : ., AW 
A(t) = Ai(t) + tho(t) = H(w)e™ on 


—21h 


l=R/W= 





bits/cycle. (2) 


At the receiver the added complex noise process, 0(t) = u(t) + 
iv2(t), is assumed to be white Gaussian with v(t) independent of v2(t) 
and each possessing a double-sided spectral density, No/2. So, 


Ela(t)|? = Evi(t) + Ev3(t) 
= Nod(0), 


where 6(7) is the Dirac delta function. The average transmitted signal 
power, Po, for a flat transmitting filter can easily be calculated. 
However, for our purposes a more relevant quantity is the received, 
unfaded signal power 


L?’-1K? 


= 2 = ee 
P K*Po 2 3 T?? 





where K is a constant that includes the effects of amplifiers, antennas, 
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and the unfaded channel loss. Also, the added average noise power in 
the Nyquist band, W= 1/2T, is 
No 
P,=—. 
T 
Thus the unfaded received s/n, a most important system parameter, 
is 
L?-1K? 1 
=p = 2—__—__. 3 
s/n =p 3 MT (3) 
The receiver structures under consideration consist of a perfect 
demodulator followed by a front-end filter possessing the complex 
impulse response W(é), a sampler, a decision device, and a canceler. 
The design of an optimum receiver entails the selection of W(t) and 
the canceler for a particular channel characteristic. Since the channel 
characteristics are usually unknown to the receiver, these components 
must be determined adaptively. 
To understand the function of the canceler, consider the signal 
sample at the output of filter W(¢), 
Xn = DY FrGn-r + Zr, -MW SNS, (4) 
k=—00 
where 7; = 7(RT + to) is the overall complex-sampled system impulse 
response and Z, is the Gaussian noise output sample. Ideally the 
canceler strives to synthesize the value 
Vn = » FrAn-k (5) 
keS 
and to subtract it from (4) where the set of integers S is defined as 
{RES:k=—-N,--- —1,1-++ No}. The canceler’s ability to synthesize 
these values presumes that some past (Rk = 1, --- , Ne) and/or future 
(k = —1, --- , —N;) transmitted data symbols are perfectly detected 
and, moreover, that the set of complex numbers, 7;, are adaptively 
estimated. _ 
The front-end filter, W(t), is usually determined adaptively by 
minimizing the mean-squared error (MSE) between the sample, x, — 
Yn, and the expected data symbol a,: 


MSE[N,, No, W(t)] = E|Xn — Jn — Gn’, (6) 
and the optimum filter, Wit), is chosen to achieve 
(MSE) = min MSE[Ni, No, W(t)] = MSE[Ni, No, Wolt)]. (7) 
t 


Since (6) is a quadratic functional of W(t), a unique minimum can 
always be found. It is standard to represent the linear filter W(t) by a 
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transversal structure and in practice the search for the minimum is 
accomplished by varying the taps of this filter until a minimum of the 
time average of the squared error is found. Clearly, to realize such a 
minimization procedure, estimates of the transmitted data symbols 
must be used. 


lll, SYSTEM PERFORMANCE—GENERAL 


To get at the efficiency index of a system, the error rate as a function 
of data rate for any choice of the canceler set, {S}, and front-end filter, 
W(t), must be explicitly expressed. Unfortunately, exact relationships 
are not mathematically tractable for the simplest of systems and so we 
must employ upperbounds. Fortunately, for the systems under consid- 
eration, it is possible to obtain exponentially tight inequalities. 

With this approach in mind, note that after perfect cancellation, the 
decision variable, from (4) and (5) becomes 


Sn = Xn — In 


= PoGn + y; F.Qn—k + Pas (8) 
kEJ 


where now the set JisSU0, {RE J: Rk =—N,--- 0-++ Ni}. Decisions 
in QAM are made on the real part of s, and, separately, on the 
imaginary part of s,. Simple calculations give 


Re(Sn) = pbodn — Vobn + » (UrGn—k — UrDn-z) + Zn; 
KEI 


and 
Im(sn) = pobn + Votn + Y, (Uxbn—-z + UVEAn-z) + Zn2, (9) 
kJ 
where 
r, = Be + Wp, 
and 
Q2rVv 
Zp = 2n1 + 122 = [ b(t) W(t)dt. 
—Qr wv 


For an L-level system, slicing levels are placed at 0 + 2 --- + 
fio(L — 2) and compared with the received samples Re(s,) and Im(s,). 
An error occurs whenever the noise plus intersymbol interference (in- 
phase and quadrature) exceed in magnitude the distance from the 
transmitted level to the nearest decision threshold, po. However, the 
outside two levels can be in error in one direction only. 

Now denote the event of an error committed in the “real” rail by E, 
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and in the “imaginary” rail by E;. Then the probability of system 
error, P., is the probability of either (or both) E, or E; occurring, 


P. = P(E,-UE;) = P(E,) + P(Ei), (10) 








where 

P(E,) = (: — z) Pr Zn — ds, (UrAn—k — UEOn-2) + Vobn| = 1 | 
and 
P(Ei) 


1 
= (1-3) Pr| = 1 (11) 
Because of symmetry, P(E-) = P(E;) = P(E), and so we only need to 
upperbound P(E). 
We adopt a bounding procedure introduced by B. Saltzberg’ to 
analyze the error rate in an unequalized baseband system. We have 
extended Saltzberg’s approach to our systems and it can be shown that 


P(E; A, B, 6) 








Zn2— , (prGn-z + UrAn-z) — Voto 
het 


2 
& (L — 1) ( |pe| + |u| + sv) 
kEA 


= 2 exp (12) 





2 
2 ot, +4 3 “| 3 i+ ot+a-ao8|| 


kEB 


The set of integers A and B form a partition on the set of integers not 
included in J. That is, 


AUB=Q= {k:k EJ} 
and 
ANB=¢9. 
The variable 6 = 1 or 0, and 


No [ice 
oi] | W(t) |?dt. 


The sharpest upperbound is obtained by minimizing (12) with respect 
to the sets A, B, and 6. Algorithms for carrying out this minimization 
can be devised readily. 
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IV. SYSTEM PERFORMANCE 


4.1 Discussion 


Equation (12) is a rather general upperbound on the error rate for 
any passband linear data transmission system and it will now be 
specialized to include the effects of the different choices of equaliz- 
ers.Before proceeding with the detailed numerical analysis, we need to 
make a connection between the mean-squared error (MSE), which is 
minimized by equalizers, and the system probability of error, which, 
ideally, should be the quantity minimized. 

A straightforward but tedious approach for getting at the error rate 
might be to first determine the filter, W(t), which minimizes the MSE 
for any particular equalization scheme, calculate the overall resulting 
impulse response, and then use eq. (12) to upperbound the error rate. 
This approach can be circumvented by exploiting the explicit relation- 
ship between the minimum MSE and the value of the overall impulse 
response at t = f) when the optimum filter, Wo(t), is used. 

The optimum structure of the minimum mean-squared error receiver 
can be shown to consist® of a matched filter in cascade with a trans- 
versal filter combined with a linear intersymbol interference canceler. 
The implication of this structure is that the resulting overall system 
transfer function is a real function of frequency. Or, the complex- 
sampled impulse response, {uz + iv, }2., must be a real number at 
k = 0, which results in Up = 0. This follows from the Fourier Transform 
representation of 7(t), from which we see that at ¢ = 0 the integrand 
is real and nonnegative. Indeed the overall phase characteristic has 
been removed by the matched filter (without enhancing the noise”*). 
Using the fact that vo = 0 and careful numerical analysis of the 
available channel characteristics, our calculations showed that for all 
practical purposes the bound (12) becomes 


2 
PUR S\ ei a || (13) 
2| +o(L) Y (wet “| 
kéS 


L’-1 





where we set o7(L) = 


As will become apparent, the argument of the exponential function 
in (13) can be directly related to the minimum mean-squared error. 


* A fractionally spaced (7/2) transversal filter can automatically synthesize any 
rasa pes filter and thus eliminate phase distortion and also compensate for timing phase 
(see Ref. 7.) 


DIGITAL COMMUNICATIONS 437 


Towards this end we recall a well-known® result that states that the 
best achievable MSE has the simple representation, 


(MSE)o = 20°(L)(1 — po), (14) 


where po is the sample at ¢ = t at the output of the optimum filter. 
Also, when the optimum filter, Wo(t), is used, a straightforward cal- 
culation of the resulting MSE gives 


(MSE)o = 207(L)(1 — po)” 
+20°(L) ¥ (uz + v3) + 203,. (15) 
kéS 


Relationships (14) and (15) make it possible to write (13) as 


ets | (MSE)o |” 
P(E; S) <2 op |- aaa E | 


~2exp — for No—0O, (16) 


1 
(MSE) 
relating error rate and minimum MSE. This is an extremely useful 
inequality since (MSEpo) as a function of channel characteristics is 
often explicitly known for different equalizer structures. 

It is also interesting to note (this has been pointed out before®) that 
the filter, W(t), that minimizes MSE also minimizes the upperbound 
on P.. This is true because the same quadratic functionals in W(t) are 
involved in the optimization of both expressions. 

We are now in a position to specialize our formulas to the various 
equalizer structures under investigation. 

The six examples that follow do not require the knowledge of 
channel phase characteristics to compute performance. Implicit in 
each of these schemes is the complete removal of phase distortion, 
which can be accomplished without noise enhancement. Only a mag- 
nitude characterization of the channel transfer response was available 
at the time the work reported here was done. While departure from 
flatness of the magnitude fundamentally affects performance, theoret- 
ically, departure of phase from linear has no effect on attainable 
performance. Therefore, the lack of phase characterization of the 
channels was not an obstacle to our study. However, a complex 
characterization of the channel would be useful in determining the 
minimum number of required taps in the designs of the equalizers. 


4.2 Pure phase equalization 


In this particular equalizer, Ni = Nz = 0 (where N; and N2 are the 
lengths of the precursive and postcursive cancelers, respectively). We 
choose W(t) so that 
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Ww) = e7#©), ee 
(w) =e [w| 7 
T 
T’ 
where W(w) is the Fourier transform of W(t) and ¢(w) is the channel 
phase characteristic. For this choice of filter, only the magnitude of 
the channel transfer function enters into the computation of the bound, 
as shown in eq. (13). 

Using the well-known Poisson sum formula along with some algebra, 
it is possible to write (13) more explicitly, i.e., 


p (H)? 
20°(L) 1+ p ((H — (H))?) J’ 


= 0,|w|> 


P.s2 exp|- (17) 


where we used the shorthand notation 
n/T 


and H is used in place of | H(w)|. 


4.3 Linear equalization 

Here again Ni = Ne (no canceler) and W(t) is chosen to minimize 
(6). The expression for the optimum MSE in this case has been shown 
to be® 

: > 

1+ pH? © 
This formula is directly used in (16) to calculate the upperbound on 
error rate: 


(MSE)o = 207(L) < (18) 


1 1 aa 
Pie Cex ae ee), 
exp] 20°(L) (<; + >} 


4.4 Inverse equalization 


In this case N, = N2 = 0 (no cancellation) we choose W(t) to be the 
inverse of the channel frequency characteristics, 


Ww) = Hw), Jo} <= 
T 
= 0, |w| >i 


Here the channel is clearly perfectly equalized so that intersymbol 
interference is completely eliminated; the penalty is increased output 
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noise power. For this simple scheme, it is possible to express the error 
rate exactly but for reasons of uniformity we use the upperbound 


1 
Pe <2 exp — iE) I (19) 
H? 


4.5 Decision feedback 


_For this equalization system N, = 0 and N2 = « and again we choose 
W(t) to minimize (6). In this type of equalizer, the causal or postcursor 
intersymbol interference is entirely eliminated and an expression for 
the optimum MSE is known,’*”’ as shown below. 


(MSE )o = o°(L)exp{—(In[1 + pH"])}. (20) 
This is used in (16) to express an upperbound on error rate. 


4.6 The ideal equalizer 


In this utopian scheme the precursor and postcursor cancelers 
become infinite, N; = N2 = ©, so that all the intersymbol interference 
is eliminated. In this ideal situation we obtain the very best possible 
result, namely, the matched filter bound, which is a lower bound on 
P.. This scheme assumes that it is possible to detect each pulse 
a,r(t — nT) optimally by a matched filter without incurring interfer- 
ence from all other pulses. The filter, W(t), in this case is chosen to be 
matched to the channel characteristic, i.e., 


W(w) = H*(w), |o| = 


T 
ro) 


=0 > 
, [of T 


where * denotes the complex conjugate. 
For this idealization the upperbound on error rate is simply, 


P.<2 exp{- ay}. (21) 


= 
20°7(L) 
No other detection scheme can do better. In the next section we use 
these formulas to calculate the various efficiency indices. 

Before concluding this section, we remark that there is one more 
easy case and one extremely difficult case that might be considered as 
candidates for making comparisons. Suppose that no filtering, other 
than out-of-band elimination of noise, were performed at the receiver. 
What performance can one expect? While we cannot answer this 
question exactly because channel phase characteristics are unavailable 
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at this writing, we would expect performance to be worse than remov- 
ing the phase characteristic entirely—a situation we will examine. 

The second approach, which is a very difficult one to analyze, 
involves the use of a finite-state Viterbi decoder. Nevertheless, we will 
report a bound on the performance of this processor. Specifically, the 
performance of an infinite canceler (the matched filter) is superior to 
the maximum likelihood (Viterbi) decoder. As shown later, for the 
channels considered, the matched filter bound is close in performance 
to decision feedback. Consequently, we shall see that the performance 
of maximum likelihood sequence estimation is tightly bracketed be- 
cause it is superior to decision feedback. 


4.7 Information theory bound on communications efficiency index 


In this section we discuss a formula for the maximum number of 
bits per cycle that can be attained for a given H(w). If H(w) were 
constant in frequency the formula for the efficiency index in bits per 
cycle would be simply 


I= log2(1 + p|H|’). 


It is reasonable to expect that if H(w) is frequency-dependent, the 
maximum efficiency index would be 


[= ‘ | loge(1 + p|H(w)|”)dw, (22) 


where the integral is over a frequency band of size Q = 27W. Indeed 
this is the case. To outline a derivation, we note first that A. Kolmo- 
gorov has generalized Shannon’s notion of capacity to provide a very 
fundamental definition that gives a useful starting point for developing 
capacity formulas in nonstandard situations such as the one at hand.” 
M. S. Pinsker™ was able to derive from the Kolmogorov approach a 
formula for the amount of information in a stationary Gaussian process 
about another stationary Gaussian process related to it. Specifically, 
if S,(w) and S,(w) are the power spectral densities of the processes 
and S,,(w) is the cross-spectral density, the formula for the amount of 


information is 
= __[Sx(w)/? 
i in(1 or ) o 


If we require the transmitter output to be Gaussian, then since the 
additive noise is Gaussian, Pinsker’s formula can be applied to the case 
where x is the transmitted process and y is the received process to 
obtain (22). Requiring the transmitter output to be Gaussian is really 
not a limitation since, when the additive noise is Gaussian, one can 
prove that capacity is attainable with a Gaussian transmitter output 
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using the methods discussed in Refs. 14 through 16. Thus, (22) gives 
the efficiency index formula. 
References 16 and 17 also provide approaches to establishing (22). 


4.8 Information theory limit on index when transmitter is optimized 


So far we have treated p, eq. (3), as a constant. In this section we set 
the stage for exploring the advisability of optimizing the output power 
spectral density to maximize the efficiency index. Since we are now 
allowing in-band shaping of the transmitter filter frequency character- 
istic, we will consider p as a function of w and write p to denote the 
previously considered case where p is constant over the band. 

Although the analysis in this section is focused on the information 
theory limit on the efficiency index, the decision feedback index 
involves the identical functional form and so our analysis will be 
applicable to decision feedback as well. 

We will compare the previously discussed index 


I(flat) = i loge(1 + p|H(w) |”)dw 
with 
I(opt) = a | loge(1 + po(w) | H(w) \")da, 


where po(w) is the function maximizing J under a constraint on f p(w) 
dw, the received signal-to-noise ratio in the absence of fading. This 
constraint is equivalent to a constraint on the transmitter output 
power, since in the absence of fading the channel has a flat loss 
characteristic. 

This optimization problem is known” and yields easily to the cal- 
culus of variations. The solution is called “water pouring.” The name 
stems from the graphical interpretation that if p{ is the constraint on 
J p(w) dw, the optimum p(w), which we denote by po(w), is obtained by 
forming a vessel with base |H(w)|~* and vertical sides at the band 
edges. One pours “water,” that is, area, of amount pQ into the vessel 
and po(w) is given by the depth of the water at w. It is clear that this 
construction obeys the constraint. Generally, if p is sufficiently small, 
the poured water will not touch both of the vertical sides of the vessel. 
In such situations, it would be advantageous to limit the transmitted 
power to a frequency band less than Nyquist. In our case, however, 
the unfaded signal-to-noise ratio is so great that, for the simulated 
channels, the water level always meets both vertical sides. In other 
words, the transmitted power always occupies the full Nyquist band. 
Thus, po(w) = A — | H(w)|~?, where A is chosen so that f po(w)dw = 
pQ, that is, AQ — f | H(w)|~-?dw = pQ or 
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pow) =pt+a al | H1(v) |Pdv — | (w)|~. 





Therefore, 
| loge(1 + p| H(w) |")dw 
I (flat) oes Ae ee ee 
T(opt 2 
- | log | | [Ht |-2dv + BLE) | do 


We expand the logarithm for p large to find an asymptotic represen- 
tation. We get, after a cumbersome derivation, 


-1 


i log2|H |?dus 
I (flat) _ 1 


I (opt) 2p"logep Q logep 


| Jame (Jan 2 
|A| || +o(=4.), 93) 


plogsp 


The last multiplier in the perturbation expression represents the 
variance associated with a random sampling of a specific | H(w)|~ 
This multiplier is zero if | H(w)|~? is a constant. We note, for later use, 
that for p = 10°, (i.e., a 63-dB s/n in the absence of fading) we have 


—~—) < 10. 
2p*logep 


4.9 Communication efficiency index 


The relationships in eqs. (17) through (21) are unifying expressions 
for the error rate for the five equalization cases. From Section 1.0 we 
have I = 2logeL and o7(L) = [(L? — 1)/3]. These two equations in 
conjunction with the P, formulas enable us to determine J as a function 
of the P. objective, p, the channel, and the equalization scheme. 


Specifically, 
_ G(H, p) 
I= lore] SEO | 


where G(H, p) is a function that depends on the communication 
method. With the channel response considered to be a random func- 
tion, J is a random variable, and we can determine its probability 
distribution function for each communication scheme. The quantities 
p and P, are parameters of the distribution. 
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V. MODEL FOR THE FADING CHANNEL 


Now we describe the mathematical model for frequency-selective 
fading owing to multipath reception. This mathematical model for the 
random functions | H(w) |? is due to W. D. Rummler’”’ and is based on 
measurements of a 26.4-mile hop between Palmetto and Atlanta, 
Georgia. The measurements of frequency-selective fades were made 
on a 25.3-MHz channel situated in the 6-GHz band during the heavy 
fading month of June (1977). 

Rummler’s model uses a two-ray representation of the signal, which 
was quite adequate for fitting the experimental records. (The model is 
not necessarily intended to depict the underlying physical mechanism 
for a fade. The true mechanism could involve a much more complex 
ray combination.) Also, it is not possible to deduce the phase charac- 
teristic associated with any particular amplitude characteristic. It has 
been experimentally determined that this kind of a channel cannot 
always be viewed as minimum phase.”® 

In the model, the | H(w) |” functions are 68-degree sections of scaled, 
displaced cosine waves. Specifically, conditional on a fade occurring 


| H(w) |? = a?|1 + b? — 26 cos(wr + 4)|, 


where: 
(i) b = 1/108” > 0 with B an exponential random variable with 
mean 3.8. 

(iz) The parameter, a, is a log normal random variable with de- 
pendence on the parameter, b. Specifically, a = 1/104”°, where A is 
normal with a mean of 24.6(B* + 500)/(B* + 800)dB and a standard 
deviation of 5 dB. 

(uz) The phase, 0, is independent of a and 6b and has a constant 
density on each section |6| > 7/2 and |0@| < 7/2 with P{|6| < 7/2} = 
5+-P{|6| > 7/2}. 

(tv) The scale factor 7 is a constant = 6.31 nanoseconds. 

In the model, the channel is in the faded state for 8060 seconds in a 
normal heavy fading month. Thus the channel can be viewed as being 
in one of two states where: 


P {unfaded state} = 0.99689 
P {faded state (Rummler model operative)} = 0.00311. 


In what follows we employ this model to estimate the outage 
distributions for various communication methods. The model should 
be regarded as a working assumption valuable in gaining initial insight 
into the potential of the communication techniques we consider. 
However, we emphasize that more measurements may be required to 
refine Rummler’s model to accommodate different geographical situ- 
ations and wider bandwidths than 25 MHz, and to sharpen the accu- 
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racy of the representation of the more severe fades that are of major 
concern in what follows. 


VI. OUTAGE OBJECTIVES AND SOME PROSPECTIVE INDEX VALUES 


From the proposed performance objectives for the digital transmis- 
sion network,’ we have that the round trip system availability objec- 
tive is 99.98 percent. So the probability of outage is 0.0002 round trip 
or 0.0001 one way. The 0.0001 breaks down as 0.00005 for fading, 
0.000025 for equipment failure, and 0.000025 for maintenance and plant 
errors. Thus, for a 4000-mile system composed of 156 hops, each with 
a nominal length of 25.6 miles, we get a per hop outage probability of 
3.2 x 107’ for fading. This corresponds to about 10 seconds of outage 
per year. If we assume the year is composed of three heavy fading 
months and nine months with no fading, we obtain that the probability 
of outage in a heavy fading month is 1.28 x 10~°. For a 250-mile short- 
haul system, the outage objective is 16 times less stringent on a hop, 
namely, 2.05 x 10™. 

For the purpose of discussion we shall later consider the possibility 
of accommodating two DS-3 digital signals in a 20-, 30-, and 40-MHz 
channel. Each DS-3 signal corresponds to 672 64 kb/s voice circuits, 
so that two DS-3 signals correspond to about 90 Mb/s. Thus, for 20-, 
30-, and 40-MHz channels we require 4.5, 3, and 2.25 bits per cycle, 
respectively. We will use 10~‘ as the probability of bit-error threshold 
for registering outage. The sensitivity to this threshold will also be 
analyzed. 


Vil. COMPUTER PROGRAM 


A comprehensive FORTRAN program was written to compute and 
display outage distributions. The program is composed of three main 
segments. 

The first segment simulates the power transfer characteristic for the 
channel in the faded state. It uses a PORT routine to generate random 
numbers uniformly distributed on [0, 1] and functions of these are 
evaluated to produce the random variables with the three densities 
underlying Rummler’s model. The variables A and B are appropriately 
correlated. The random channel characteristics are then computed 
and a file containing them is produced. The file contains 25,000 
characteristics. 

The second segment calculates the efficiency index for each channel 
and then computes the probability distribution function of the indexes. 
Various options and parameters can be chosen in exercising this stage. 
These include: 

(t) Method of communication (i.e., type of linear equalization, 
decision feedback equalization, MLSE, and the information theoretic 
optimum processing) 
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(tt) Transmitter spectrum, i.e., optimization of the transmitter 
spectral density for a given average power constraint or flat power 
spectral density 

(iti) Probability of bit-error objective 

(tv) Unfaded signal-to-noise ratio at the receiver input 

(v) Channel bandwith. 

Option (ii) is only available for the decision feedback and the 
information theoretic optimum communication schemes. Extending 
the option to the other schemes seemed inadvisable because of the 
closeness of the results, as will be seen later. 

The number 25,000 was found through computational experience to 
stabilize the density tail in the range of interest and yet not be wasteful 
of computer resources. Since the number 25,000 is very close to the 
number of experimental records of fade characteristics, we could have 
worked from original experimental data. We elected to work with the 
Rummler model since it is weighted to track the worse fades, which 
are our interest here, and since the model is widely accepted. 

The final segment of the computer program provides labeled plots 
of the outage distribution functions. It uses the graphic package 
DISSPLA. 


Vill. PRINCIPAL RESULTS 
8.1 Preparatory remarks 


For the purpose of presenting our principal results we will need the 
following notation for the outage distribution functions: 

Fpy: phase distortion removed 

Fri: optimum linear equalization 

F pr: postcursive intersymbol interference (ISI) removed 

Fur: all ISI removed (matched filter bound) 

Fir: information theory limit (Shannon). 

The efficiency index distributions were computed for 30-MHz chan- 
nels. Strictly speaking, the notion of an index in bits per cycle is 
imprecise in that F(Z) (the probability distribution of bits per cycle) 
would change if calculated at 20 MHz or at 40 MHz. However, by 
calculation we established that, for the purposes of the discussion that 
follows, treating F(Z) as invariant over the 20- to 40-MHz range of 
bandwidths is an adequate approximation. 

In the actual development of systems of the kind we have idealized, 
much more detailed performance analysis is required than that re- 
ported here. One important aspect we have not considered is the effect 
of excess bandwidth associated with practical filter designs with rolloff 
factors other than zero. To get a preliminary idea of the effect of rolloff 
of an amount a on the communication efficiency indexes, we would 
simply scale the distributions abscissas by an amount 1/(1 + a). 
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Equivalently, we would inflate the desired number of bits per cycle by 
1 + a before going to the curves. Thus, in considering the accommo- 
dation of two DS-3 signals, with a = 1/3, we would inflate the 4.5-, 
3-, and 2.25-bit per cycle values corresponding to 20-, 30-, and 40-MHz 
channels by 1.333 to obtain 6, 4, and 3 bits per cycle. We would then 
consult the derived curves at these values to obtain the outage prob- 
abilities. For the purpose of discussion in the section that follows, we 
use these three inflated bits per cycle values along with the long-haul 
and short-haul objectives of 1.3 X 10° and 2.1 x 107°, respectively, 
given in Section VI. Subsequently, an alternative means of accounting 
for a will be given. 


8.2 The graphs 


The most striking features of the outage distribution functions F'(/J; 
P., p) are exhibited in Fig. 2. The beneficial effects of adaptive 
equalization are apparent. The three equalization schemes yield 
roughly similar results; however, as one looks more closely at the 
extreme outage tail, Priv, For, and Fyr begin to depart from each 
other. Fr is displaced over two bits per cycle to the right of Fir, while 
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Fig. 2—Comparison of index distribution tails at 63-dB s/n, p.e. < 10 (p.e. does not 
apply to the Shannon limit curve). 
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F py is very substantially to the left of Fri. For the 40-MHz band (3 
bits/cycle), both outage objectives are met with linear equalization. 
For the 30-MHz band (4 bits/cycle) and the long-haul objective, linear 
and decision feedback equalization are not adequate and some coding 
or use of maximum likelihood sequence estimation are possible solu- 
tions. However, it may be practical to overcome the shortfall by some 
other means, such as improving the amplifier noise figure. For the 20- 
MHz channel (6 bits/cycle) the long-haul objective is not met. Also, 
this efficiency is so close to the information theory limit that any 
attempt to achieve it by coding may be ill-advised because of com- 
plexity. On the other hand, with some moderate coding the short-haul 
objective for 20-MHz channels should be attainable. For the other two 
bands, short-haul objectives are roundly met. 

The plot for the equalizer that inverts the channel is not shown, as 
it is not perceptibly different from that for the optimum linear equal- 
izer. This is expected since the optimum linear filter is essentially 
inverting the channel at the high signal-to-noise ratios we are consid- 
ering. 

Figures 3, 4, and 5 show the sensitivity of FU, P., p) to P.. The 
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Fig. 3—Index distribution tail sensitivity to probability of error for linear equalization 
at 63-dB s/n. 
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Fig. 4—Index distribution tail sensitivity to probability of error for decision feedback 
at 63-dB s/n. 


sensitivity is especially small at the low error rates needed for data (as 
opposed to voice) transmission. An asymptotic analysis shows that, for 
large p, the curves translate to the left in accordance with a log2N shift, 
where 10~” is the P, objective. This insensitivity is an illustration of 
the well-known result’ that once a pulse code modulation (PCM) 
operating point is achieved it takes a very small improvement to make 
the error rate an order of magnitude smaller. In fact, if at some 
operating data rate the probability of error turns out to be 10~° — 107°, 
one should be able to design an error-correcting code of small redun- 
dancy and moderate complexity that could improve the error rate by 
several orders of magnitude. 

Figures 6 through 9 illustrate the sensitivity to signal-to-noise ratio. 
The translation in all cases is roughly 1/3-bit/cycle/dB. Note the 
curves for the Shannon limit have an ordinate range of 4 to 10 bits per 
cycle, while the others range from 2 to 8 bits per cycle. No sensitivity 
for Fpy is given since, unlike all the other distributions, there is 
negligible improvement as p increases. This is because, as p increases, 
the effect of intersymbol interference (ISI) remains and nothing is 
being done to mitigate it. In the other four cases, ISI tends to be 


DIGITAL COMMUNICATIONS 449 





PROBABILITY THAT INDEX IS LESS THAN ABSCISSA 
r=} 
oa 





PROBABILITY OF 
ERROR <10 
4 5 6 7 


BITS PER CYCLE 


8 


Fig. 5—Index distribution tail sensitivity to probability of error for the case of all ISI 
removed. 


eliminated as p increases (the channels can be perfectly equalized 
without an inordinate amount of noise enhancement). 

In Section 8.1 we mentioned the (1 + «a)~ scaling as a method of 
accounting for rolloff. This assumed W= 1/T so that (1 + a)W is the 
actual bandwidth. Suppose instead that the real bandwidth is fixed at 
W but the data rate is slowed by an amount (1 + a), leaving the 
average transmitter power and N, constant. Then the true s/n is 
increased by 10 logio(1. + a). From this alternative perspective the 
suggested (1 + a)~’ scaling would be supplemented by a shift to the 
right of the probability distribution function tail by approximately (10/ 
3)logio(1 + a) bits per cycle. Whether in estimating the effect of rolloff 
one takes the perspective that the symbol rate or the bandwidth is 
fixed is a matter of convenience. 

Next, we consider optimization of the transmitter power spectral 
density. There is a practical question as to whether such an optimiza- 
tion could be achieved since the fade characteristic, which is first 
determined at the receiver, would need to be relayed back to the 
transmitter in time to be useful. However, the question is academic 
since we demonstrated that, even if an optimized transmitter could be 
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s/n = 54 dB 
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Fig. 6—Index distribution tail sensitivity to s/n for linear equalization, p.e. < 10‘. 


adapted in real time, the performance benefit would be negligible. 

To understand why it is not worthwhile to optimize the transmitted 
power spectral density, we first consider the information theory limit 
that was anlayzed in Section 4.7. Our detailed numerical work has 
shown that a plot of the tail of the index distribution for the informa- 
tion theory limit under the assumption of an optimized transmitter 
would be imperceptibly different from the Fpr tail plotted in Fig. 2. 
This closeness of the two distributions follows from the fact that, for 
the severest fades in our data base of 25,000 channels, |H|° is of the 
order of 10° and the terms involving |H|~* (~10~”) in the perturbation 
expression, eq. (23), are not enough to overcome the 10~“ multiplier. 

Since the decision feedback index has the same form as (22), we can 
also conclude that the distribution tail corresponding to decision 
feedback would not be significantly altered if the optimum transmitter 
were used. 

The decision feedback index and information theory limit on the 
index give imperceptible benefits when the power spectral density is 
optimized; therefore, it seems extremely unlikely that there is any 
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Fig. 7—Index distribution tail sensitivity to s/n for decision feedback, p.e. < 107‘. 


worthwhile benefit associated with optimizing the transmitter in the 
other cases. 


IX. INITIAL ESTIMATE OF THE EFFECT OF FREQUENCY DIVERSITY 


In implementations, digital radio systems are often protected with 
frequency diversity. In such systems impairments such as fading and 
equipment outages prompt the switching of communication traffic to 
a protection channel situated at a different frequency. The notation 
mxn means that m protection channels back up n working channels. 
So long as a protection channel is not occupied by an impaired channel, 
or is not itself impaired, it is available for temporary use in any of the 
n working systems. Some illustrations are 2 X 10 and 1 x 11 at 4 GHz, 
while at 6 GHz 2 X 6 and 1 X 7 are examples. 

For FM systems the factor expressing the improvement in outage 
associated with frequency diversity is given by the expression 100/DG. 
foH” in eq. (24) [corresponding to (34) and (35) in Ref. 21]. The 
parameter fo represents the carrier frequency in gigahertz and D 
denotes the path distance measured in miles. The parameter G depends 
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Fig. 8—lIndex distribution tail sensitivity to s/n for the case of all ISI removed, p.e. 
< 10%. 


on the details of the frequency protection. G incorporates combinato- 
rial effects corresponding to using m channels to back up n as well as 
empirical expressions involving the individual frequencies of channels 
involved. The term H is commonly expressed in decibels as —20 log H 
and is called fade margin. The fade margin is the smallest loss relative 
to the unfaded received signal at which the system fails. The notation 
H for the voltage level agrees with the previous use of H in this paper 
so long as the channel has a flat characteristic. 

As pointed out in Ref. 22 the notation of a flat fade margin is 
considered meaningless in digital radio systems since the frequency- 
selective aspect of the fade characteristic appears necessary to describe 
performance of a channel. As of this writing we are not aware of any 
method in the extant literature for extending our results to include the 
effect of frequency diversity. However, for the special case of optimum 
linear equalization there is a way to introduce an equivalent flat fade 
margin so as to enable the use of (24) in making a (preliminary) 
estimate of the diversity effect. The estimation method was discovered 
in the course of generalizing the computer program to compute index 
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Fig. 9—Distribution tail sensitivity to s/n for the Shannon limit. 


distributions for arbitrary bandwidths. The generalization was needed 
to investigate—and, as it turned out, to substantiate—the bandwidth 
insensitivity of the distribution in the 20- to 40-MHz range. The 
generalized program was also exercised for bandwidths an order of 
magnitude smaller and it was observed that the Fy: distribution 
changed imperceptibly.* So the Fr: tail in the range of interest here 
can be correctly obtained by using the univariate samples | H(0)|’. 
Treating (18) as an equality and solving for |H(0)|* and then substi- 
tuting in (24) gives an estimate of the improvement owing to frequency 
diversity. 

For an illustration refer to Fig. 2 for which s/n = 63 dB and Pe < 
10-*. We see that with linear equalization, at the long-haul outage 
objective of 1.3 x 10°°, 3.2 bits per cycle can be supported and the 
corresponding number for the short-haul objective is 5.3 bits per cycle. 
Using (24) and the G values from Ref. 21 we show in Table I the 


* This is not true of the individual values of J and no claim is made for invariance of 
For, Fyp, or Frr. 
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Table |—Estimates of improved bits/cycle indices 
using frequency diversity 
Channel at 4 GHz Channel at 6 GHz 


Short Haul (1 X 11) 7.1 (1 X 7) 6.5 
(2 < 10) 78 (2 x 6) 7.1 
Long Haul (1 x 11) 5.3 (1x7) 4.6 
(2 X 10) 5.8 (2 x 6) 5.3 


following estimates of improved indices when frequency diversity is 
employed. 
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XI. POSTSCRIPT 


Application of multilevel QAM in the radio channels might be 
inhibited by the amplitude (AM-AM) and (AM-PM) nonlinearities 
present in RF power amplifiers. A method for solving this problem 
without sacrificing amplifier power efficiency will be described in a 
forthcoming paper.” 
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Chromatic Dispersion Measurements in Single- 
Mode Fibers Using Picosecond InGaAsP 
Injection Lasers in the 1.2- to 1.5-um Spectral 
Region 
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D. L. PHILEN 


(Manuscript received September 23, 1982) 


We describe the use of picosecond InGaAsP injection lasers for 
measuring chromatic dispersion in single-mode fibers in the 1.2- to 
1.5-um spectral region. Injection lasers at various wavelengths and a 
single-mode fiber-to-fiber switch are used in the pulse delay and 
pulse-broadening measurements. The simplicity and the compactness 
make the setup useful for field measurements and quality control. 


1. INTRODUCTION 


Modal and chromatic dispersion measurements in multimode and 
single-mode fibers provide important information about the bandwidth 
limitations in optical fiber transmission. A near-infrared, fiber Raman 
laser with subnanosecond pulses in the 1- to 1.7-~m region can be used 
for dispersion measurement in both multimode and single-mode fi- 
bers.'* While this infrared-fiber, Raman-laser-based measurement 
system has been very useful and widely adopted, it has its limitations 
in terms of time resolution (by the mode-locked laser pulsewidth, ~140 
ps) and is not suitable for field measurements because of its large size. 

In this paper we describe a simpler dispersion measurement system 
based on picosecond InGaAsP injection lasers and ultrafast InGaAs 
p-i-n detectors. This is similar to the measurement system we used for 
measuring high-bandwidth multimode fibers,* except now in order to 
study the chromatic dispersion in single-mode fibers by pulse delay 
measurements, we need to use injection lasers at different wavelengths 
in the 1.3-um spectral region. Besides being more compact, this system 
also has a better time resolution than the fiber Raman laser setup. 
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li. EXPERIMENTAL APPROACH 


The experimental approach is straightforward. Figure 1 shows the 
setup in which several InGaAsP injection lasers with wavelengths in 
the 1.2- to 1.5-um spectral region are used for pulse delay and pulse- 
broadening measurements in single-mode fibers. A picosecond electri- 
cal pulser is used to drive one or two injection lasers at a time to obtain 
optical pulses of 30 to 80 ps in duration. The laser pulses coming out 
of single-mode fiber pigtails are selected by a low-loss, single-mode, 
fiber-to-fiber switch’ and sent through the test fiber, the output of 
which is detected with a pigtailed InGaAs fast-pin photodiode.® The 
optical pulsewidth and wavelength-dependent pulse delay information 
can be stored in the digital oscilloscope for further processing. 

The InGaAsP injection lasers used are supplied by Lasertron Co. 
and have their center wavelengths near 1.21, 1.26, 1.315, 1.335, and 
1.525 pm when pulsed to give short optical pulses. The technique of 
generating short optical pulses is that of gain-switching,”® which man- 
ifests itself as controlled relaxation oscillation? and requires proper 
adjustment of the pumping level to give a single, short, optical pulse. 
Either the short electrical pulses from a comb generator’ and step- 
recovery-diode circuit’ or a high radio frequency (RF) sinusoidal 
drive’! can be used as the pump, except it has been shown that for low 
repetition frequencies (a few hundred MHz or less), short electrical 
pulse pumping results in shorter optical pulses.’ The laser spectral 
width is typically ~7 nm owing to multi-longitudinal-mode oscillation 
in the short-pulse generation.”® 

Figures 2a and b illustrate a typical measurement result obtained 
with a 6-km-long single-mode test fiber. The laser pulses at 1.335 and 
1.315 pm are selected with the single-mode fiber switch and are sent 
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Fig. 1—The schematic of the experimental setup. A 4 x 1 switch is shown. Five 


injection lasers are actually used in our preliminary measurements. The fifth laser has 
a connectorized fiber pigtail for direct connection to the test fiber. 
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Fig. 2—The pulse pair at 1.335 um and 1.315 ym (a) before and (b) after the 6-km- 
long single-mode fiber. 


through the test fiber. The pulses before and after the fiber as recorded 
and stored are shown in Figs. 2a and b, respectively. These are both 
doubly exposed oscilloscope pictures showing the relative pulse delay 
change and pulsewidth change owing to dispersion in the test fiber. 
Note that the pulse at 1.315 um being near the minimum dispersion 
wavelength, Ao, experiences no broadening, while the pulse at 1.335 um 
shows considerable broadening owing to chromatic dispersion in the 
fiber. The display is adjusted to show the relative delay difference 
between the two optical pulses before and after the test fiber. The 
change in the delay difference (Ar) is ~60 ps, with the 1.335-um pulse 
experiencing more delay because the 1.315-um pulse lies much closer 
to the minimum dispersion wavelength, Ao. Figures 3a and b show 
similar results for the pulse pair at 1.21 and 1.315 pm. The 1.21-um 
pulse has experienced more delay and broadening than does the 1.315- 
pm. pulse. In time, the 1.21-um pulse falls behind the 1.315-um pulse 
after the 6-km fiber, even though it is ahead of the latter by more than 
2 ns. The relative delay change Ar is 3.1 ns. 

Similar wavelength-dependent pulse-delay change and pulse-broad- 
ening results are obtained for the pulses at 1.26 and 1.525 um with 
respect to 1.315 ym. The experimental results Ar(A) (in ns/km) are 
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Fig. 3—The pulse pair at 1.21 wm and 1.315 pm (a) before and (b) after the 6-km-long 
single-mode fiber. 


plotted as dots in Fig. 4a. The solid line is a third-order, Chebyshev 
polynomial fit (F(A) = a + bA + cd? + dd*) on an HP-85 computer. 
The derivative of this fitted polynomial is a second-order polynomial 
whose root (zero-crossing point) gives Ao, the minimum chromatic 
dispersion wavelength. Figure 4b plots the obtained chromatic disper- 
sion M(A) in ps/nm-km. The obtained Ao is ~1.314 wm + 0.004 pm, in 
reasonable agreement with the 1.309-um value measured with the fiber 
Raman laser setup. 


lil. DISCUSSION AND SUMMARY 


Compared with the fiber Raman laser setup, the combination of the 
picosecond injection lasers and the single-mode fiber switch provides 
a simple, compact, measurement setup for single-mode fiber chro- 
matic dispersion measurements. In addition, the setup has a better 
time resolution owing to the short pulse duration and the jitter-free 
characteristics. While it has a lower dynamic range than the fiber 
Raman laser, with its 10- to 15-dB dynamic range, typical single-mode 
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Fig. 4—-(a) The time delay vs. wavelength data (dots) with respect to the reference 
wavelength 1.315 pm. The solid curve is the fitted third-order polynomial. (b) Chromatic 
dispersion obtained by differentiating the fitted polynomial of (a). 


fiber lengths of 5 to 10 km can be measured over the 1.2- to 1.5-ym 
spectral range, and more than 20 km can be measured over the low- 
loss region close to 1.3 and 1.55 pm. In practice, the discrete wavelength 
coverage limits the wavelength resolution. For studying fibers with 
new or unconventional dispersion characteristics over a wide spectral 
range, more injection lasers are needed (e.g., 10) for more wavelength 
coverage. In single-mode fibers for which the design phase has passed 
and a specific design concerning chromatic dispersion is established, 
this setup is useful for quality control and field testing by measuring 
the pulse delay and pulse broadening at a number of selected wave- 
lengths. 
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High-Frequency Impedance of Proton- 
Bombarded Injection Lasers 
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Experimental and theoretical results are given of an investigation 
of the capacitance behavior with frequency of GaAs injection lasers. 
It is shown that, for shallow-bombarded stripe geometry lasers, the 
zero-bias capacitance decreases rapidly beyond a certain frequency. 
This is interpreted in terms of confinement of the low-level radio 
frequency current under the stripe at high frequencies. Comparison 
of the experimental results with the analytically derived expressions 
provides a measure of the material resistivity adjoining the active 
region. This inferred material resistivity is shown to be in good 
agreement with results obtained by more direct measurements. Fi- 
nally, the general conclusions are also applicable to other optoelec- 
tronic devices operating at high frequency, such as light-emitting 
diodes. 


I. INTRODUCTION 


Injection lasers are inherently suited for high-frequency (>50 MHz) 
applications.’” In the Bell System, the commonly used GaAs double- 
heterostructure (DH) stripe geometry laser is obtained by either 
shallow- or deep-proton bombardment.** For deep bombardment, the 
active stripe is well defined electrically by the high-resistivity material 
that confines it. In the case of shallow bombardment, as well as oxide 
stripe geometry lasers, current can flow laterally, beyond the stripe, 
through the conductive cladding layer.” ® 

Previously, the impedances of several laser geometries were meas- 
ured at high frequencies, and the results interpreted in terms of 
phenomenological equivalent circuits.” * In this paper, we will restrict 
the analysis and measurements to zero-biased shallow-bombarded 
laser diodes, for simplicity. We will show that, for shallow-bombarded 
stripe geometry lasers, the low-level (radio frequency) RF current at 
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high frequencies is confined under the stripe. The interpretation of 
measurements in terms of the fundamentally derived analytical expres- 
sions provides a useful measure of relevant material properties. Finally, 
although the main emphasis is on stripe geometry lasers, the analysis 
is applicable to other optoelectronic devices. Thus, the approach can 
be applied to predict the small-signal, unbiased impedance of light- 
emitting diodes (LEDs) and photodetectors operating at high frequen- 
cies. 


ll. EXPERIMENT 


The injection laser is a standard GaAs DH grown by either liquid 
phase epitaxy (LPE) or molecular beam epitaxy (MBE) technology.’*” 
It consists of 2-ym-thick N-Gai..Al,As cladding, 0.1- to 0.2-~m 
Ga,Al,As active, 1.5-um P-Ga;.,Al,As cladding, and, finally, 0.5-um 
p-GaAs contact layer. Shallow-proton bombardment is used to define 
stripe widths of either 5- or 10-um dimension. For these measurements, 
it is essential to obtain an accurate measure of the stripe width, 2S, 
and the distance of separation, ds, between the active region and the 
semi-insulating damaged region. This is done by etching the mirror 
facets in a dilute A/B solution.’® An example of such an etched mirror 
obtained by a scanning electron microscope (SEM) is shown in Fig. 
la. Figure 1b is a schematic of the various layers. The distance, da, is 
between the unetched region and the active layer. It is assumed that 
the etch stops abruptly where the conductivity changes from p-type to 
semi-insulating. (In general, this is a fair assumption, although it may 
not be accurate to better than 0.1 pm.) 

Every diode is subjected to dc current-voltage and ac impedance 
tests. The dc measurement determines rectification properties, and 
current-voltage dependence over the range between 10°” to 10°” 
amperes. The ac impedance test used RF signal levels, at or below 50 
mV, performed on unbiased diodes. At those RF signal levels, the 
diode conduction current is less than 10~"’ amperes. The laser imped- 
ance is measured at zero bias over a frequency range extending from 
10* to 5 x 10’ Hz. Over this frequency range the capacitive current is 
typically larger than 10~° amperes. Hence, in the present impedance 
measurements, the junction capacitive current is well over 10° times 
greater than the diode conduction current. Therefore, for our purposes, 
the junction can be assumed to be a capacitance with negligible 
rectified current flow. 

Usually, two capacitance bridges are used at different frequency 
ranges, with an overlap in frequency to ensure accuracy. These imped- 
ance bridges measure the equivalent parallel capacitance and con- 
ductance of the diode. As an example, Fig. 2 shows the results of two 
such measurements and two experimental values when results were 
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Fig. 1—Miror of shallow-bombarded stripe geometry DH GaAs layer after etching 
in dilute A/B etch. (a) Photomicrograph of the etched mirror as obtained by an SEM. 
In this case, the p-active layer has also been etched. (b) Layer schematic of the same 
mirror. 


obtained on two different instruments. In Fig. 2a, the zero bias capac- 
itance of diode Al4 (grown by LPE) is constant as a function of 
frequency up to 1 MHz. Above 1 MHz, the equivalent parallel circuit 
capacitance decreases rapidly with an increase in frequency. The same 
general behavior is observed on another diode from a different slice. 
The results of diode B22 (grown by MBE) are shown in Fig. 2b. The 
zero bias capacitance decreases abruptly for frequencies in excess of 
10° Hz. We also find that, for both diodes, the equivalent parallel 
circuit conductances G increase with frequency f according to the 
relation G « f”, where n is between 1.25 and 1.5. However, accurate 
measurements of conductance are more difficult to obtain than those 
of capacitance. 

The results of Fig. 2 show the abrupt drop in capacitance at fre- 
quencies that may differ by as much as an order of magnitude. 
Furthermore, some devices exhibit a more gradual reduction in the 
capacitance-frequency behavior.’’ Figure 3 shows an example of such 
a soft roll-off in the measured capacitance. The solid curve is the 
modified analysis, which accounts for asymmetry and metallization 
capacitance. The dashed theoretical curve is for the ideal symmetric 
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Fig. 2—Capacitance dependence on frequency for (a) diode A14 and (b) diode B22. 


The solid curves are obtained from analytical expressions. In (a) and (b), da is 0.6 and 
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Fig. 3—-Capacitance dependence on frequency for a severely eccentric stripe geometry 
device. 
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model. The distance between the stripe and the two edges, J, and l:, 
respectively, as well as the thickness of the damaged region, dp, are 
obtained from SEM measurements. Here, the high-frequency capaci- 
tance does not decrease as rapidly as that for the cases shown in 
Fig. 2. 


ll. ANALYSIS 
3.1 The ideal symmetric case 


Consider first the simple and ideal case of a stripe centered within 
the chip. In addition, assume that the proton-damaged region is 
sufficiently thick (greater than 2.5 micrometers), so that the only 
relevant capacitance is the junction capacitance. The RF current flows 
through the diode as shown schematically in Fig. 4a. There is a direct 
stripe capacitance, C,, defined by the width 2S and length L of the 
stripe. The RF current flows through the adjoining cladding layer, 
which, together with the junction, forms a transmission line. This 
transmission line is shown schematically in Fig. 4b. It consists of a 
distributed resistance, r, and a distributed capacitance, c. The analysis 
of such a transmission line is straightforward. The cladding sheet 
resistance, r;, is equal to p/d,4, where p is the average material resistiv- 
ity. The capacitance per unit area is ca given by Co/A, where C> is the 
low-frequency total diode capacitance and A is the total diode area. In 
the equivalent distributed parameter circuit shown in Fig. 4b, the 
resistance per unit length r is equal to r;/L. Similarly, the capacitance 
per unit length c is equal to c,L. At any point x, the current and 
voltage are given, respectively, by 


i(x, t) = I(x)e” (1) 


_PROTON 
~~" DAMAGED 





(a) (b) 


Fig. 4—Schematic of shallow-proton-bombarded, ideally symmetric, stripe geometry 
laser and equivalent circuit. (a) The various layers and the electrical elements. (b) The 
distributed parameter transmission line underneath the proton-bombarded region. 
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and 
v(x, t) = V(x)e. (2) 


The spatial parts of the current and voltage are related by the 
equations 


dV 
and 
d : 
dz = —jwcV. (4) 


These are almost standard transmission line equations that have to 
be solved with the boundary condition J(?) = 0, where ¢= (W/2)-S. 
This open-circuit boundary condition is appropriate when surface 
leakage is negligible. The input admittance, Yo, is given by 


Yo = Go + jBo, (5) 
where 
_ > sinh ¢ — sin } 
Que 2ré cosh ¢ + cos @’ 63) 
_ > sinh ¢ + sin @ 
~ 2ré cosh @ + cos > $80) 
and 
o = AW2wrc. (6c) 


From the susceptance given in (6b), an equivalent distributed par- 
allel capacitance, Cz, is obtained in the form 
Ca_1sinh¢+sing (7 
Co cosh¢+cos¢@’ ) 
where C6 = Ca¢L is the low-frequency capacitance of the junction 
whose area is ¢L. A plot of Ca/Co is shown in Fig. 5 as a function of 
¢. It is seen that the equivalent distributed parallel capacitor value 
remains equal to the low-frequency value as long as ¢ < 1. However, 
for @ > 1, the capacitance decreases at the rate of 1/¢. This decrease 
at high frequency is due to the fact that as the frequency increases, a 
relatively larger fraction of the reactive current is shunted by the 
distributed capacitance at small distances. At sufficiently high fre- 
quencies most of the current is bypassed at small distance by the 
capacitance. 
The choice of a parallel circuit, unfortunately, does not provide a 
straightforward connection between conductance and the material 
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Fig. 5—Equivalent parallel circuit capacitance and conductance for the distributed 
transmission line of Fig. 4b. The variation is in terms of the normalized parameter, ¢, 
defined in the text. 


resistivity. Nevertheless, let 9 = Lda/pé, which is the dc conductance 
between x = 0 and @ Then, Go/go is plotted as a function of ¢ in Fig. 
5. It is seen that the normalized conductance increases as ¢*. Since @ 
varies as f’”?, it follows that the conductance increases as f’, which is 
somewhat faster than the experimentally observed variation. 

The above results can be applied to the ideally symmetric stripe 
geometry configuration of Fig. 4a. The total equivalent parallel capac- 
itor of the diode C,, comprising the direct stripe and distributed 
capacitances, respectively, is given by 


A; (:-3) ee 


Gin ee 3 " $(cosh ¢ + cos ¢) |’ 


A (8) 


where A, = 2SL is the area of the stripe, Co is the total capacitance of 
the diode at low frequency, and the rest of the parameters are as 
defined previously. 

Figure 2a shows a plot of eq. (8), with a sheet resistance chosen at 
a value of 6.5 x 10° ohms. The stripe width is 5 ym, the diode width is 
250 pm, and the length is 380 um. The stripe width is obtained by 
etching the murror, as in Fig. 1. The only adjustable parameter in the 
curve of Fig. 2a is the sheet resistance. A similar theoretical curve is 
plotted in Fig. 2b for diode B22. Here, the stripe width is 10 ym, and 
the sheet resistance is 1.6 X 10* ohms. It is seen that the capacitance 
in Fig. 2b starts to decrease at a lower frequency because of the higher 
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sheet resistance. The high sheet resistance in diode B22 is due to the 
smaller distance of separation between the damaged region and the 
active layer. 


3.2 Asymmetric stripe with metallization capacitance 


Two deviations from the earlier symmetric model are observed in 
some devices. First, the stripe can be eccentric. Second, the proton- 
damaged layer can be as thin as 0.5 micrometers. These two factors 
cause the high-frequency behavior to deviate from the ideal model. 
This has been noted in independent measurements by C. W. Thompson 
et al.” 

The metallization can contribute a significant additional capacitance 
if the metallization is separated from the conducting semiconductor 
by a thin insulating layer. Such a condition can exist in oxide stripe 
geometry lasers as well as in some proton-bombarded lasers. However, 
this capacitance has unique characteristics, since it is not operative at 
low frequencies, where the electrodes are virtually connected. But at 
high frequencies, charge storage between the metal and the semicon- 
ductor can take place. A schematic of the various layers is shown in 
Fig. 6a. The proton-damaged layer thickness is d,. This semi-insulating 
layer can, at high frequencies, contribute a metallization capacitance 
Cn given approximately for GaAs by 








Cn =10/dp pF, (9) 
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(b) 
Fig. 6—Schematic of stripe geometry with thin damaged layer. (a) Charge buildup. 
(b) Equivalent transmission line circuit. The metallization capacitance results from 
charge storage across the proton-bombarded region. 
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where d, is measured in micrometers, and the chip is 250 x 380 pm, 
which is typical for our diodes. For oxide-defined stripe geometry 
lasers, the metallization capacitance is associated with the oxide layer 
itself. 

The metallization capacitance is not effective at low frequencies. 
This is obvious from Fig. 6a, where conduction through the stripe 
region bypasses the charge across the damaged region. The equivalent 
transmission line circuit is shown schematically in Fig. 6b.’° The circuit 
parameters are as defined previously, and solution of the transmission 
line equations follows the usual procedures. The total current branches 
into two components, Jp and Jo, respectively. The two admittances are 
given by 


,  wee'e sinh ¢ — sin @ 

as (c +c’) a @ + Cos | Te) 
sinh ¢ — sin } 
— Se as ee eas 10 

i vee| ae o + cos | a0) 

,_ wee’? a sinh ¢@ + sin @ 
Bor (c +c’) | (cosh @ + cos 5 | ine) 
Bo = wet (sinh ¢ + sin ¢) (104) 


(cosh $ + cos $)’ 
where 

$ = A2ar(c +c’), 
c’ = C,,/W, (10e) 


and W is the total width of the chip. 

Since the two admittances are in parallel, the equivalent parallel 
circuit capacitance is 

Co sinh ¢ + sing 

= ———_ } Co ——— + Ch], 

Co+ Cn (cosh @ + cos ¢) 
where Cp is the low-frequency junction capacitance, and C,, is the 
metallization capacitance, as given in (9). It is clear from (11) that at 
low frequency, the ¢ function goes to one, and the capacitance becomes 
equal to the low-frequency junction capacitance. On the other hand, 
at high frequency, the function goes to zero, and the capacitance 
becomes equal to the metallization and the junction capacitances in 
series. 

When the stripe is located off-center, the capacitance can be calcu- 
lated in a straightforward manner. The total chip capacitance becomes 


= As 1-—A,/A Co 
Cr=Co ‘5 + (a4) a (GF + toF2) + on |. (12) 


Cc (11) 
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where 


= sinh di + sin Qi 
sa ¢i(cosh ¢; + cos ¢:)’ i) 
oi = GV 2wr(c + c’), (14) 
A4+4+2S= W, (15) 


4 and ¢ are the lengths of the two distributed line sections, respec- 
tively, 2S is the stripe width, A, is the total area of the stripe, and A is 
the area of the diode. 

An example of a case of extreme asymmetry is shown in Fig. 3. 
Measurement of eccentricity indicates that the stripe is displaced by 
=~70 pm from its center position. The measured thickness of the proton- 
damaged region is 0.9 um, which results in a metallization capacitance 
of 11 pF. When the expression for the asymmetric case, as given in 
(12), is applied to the results of Fig. 3, the agreement between theory 
and experiment is adequate for a sheet resistance of 6 X 10°Q. On the 
other hand, the symmetric model, shown as a dashed curve in Fig. 3, 
cannot fit the data over the extended frequency range. In addition, the 
symmetric model overestimates the sheet resistance. 


3.3 Material resistivity 


Having obtained a value of sheet resistance and layer thickness, da, 
the material resistivity can be derived. However, care should be 
exercised in accounting for proton damage spread into the conductive 
layer. 

If y = 0 is the point at the edge of the damaged region where the 
local carrier concentration is (mathematically) zero, then for y > 0, the 
carrier concentration does not recover abruptly as a step function. 
Instead, the carrier concentration actually recovers as an error function 
of distance.’ For our purposes, this recovery can be represented 
approximately by the relation 


P(y) = po(l — e”), (16) 


where pp is the background-free hole concentration, and yo determines 
the rate at which carrier removal decreases with distance. 

For current flow parallel to the junction plane, the average resistivity 
p over a thickness daz is 


p= oo -2 E — exp(- daly |, (17) 


A 


where pp is the resistivity of the undamaged material. From published 
data of proton-bombarded GaAs, it would seem that the value of yo is 
0.18 + .03 ym.’*4 
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The above equations can now be applied to the results of Fig. 2. For 
diode Al4 we obtain p = 0.39 Q-cm; and po = 0.3 2-cm. For diode B22, 
we obtain p = 0.34 Q-cm, and po = 0.17 Q-cm. Several diodes from these 
slices were measured. The results were averaged and given in Table I. 
The average resistivity in the cladding, adjacent to the active region, 
in slice A is 0.28 + 0.09 2-cm. Direct measurement of resistivity in this 
material by J. L. Zilko gives a value of 0.21 Q-cm.” Hence, to within 
statistical diode-to-diode variation, the agreement is adequate. The 
resistivity of the cladding in slice B, given in Table I, is 0.18 + 0.02 Q- 
cm. These resistivities can also be used to estimate the free-hole 
concentration. For the typical 40-percent aluminum in the ternary, the 
hole mobility should be about 100 cm?/Vs.”"* The deduced-hole 
concentrations for slices A and B are listed in Table I. To check these 
deduced values with direct-hole concentration measurements, a third 
slice C was evaluated. The hole concentration, in slice C in the 
cladding, as deduced from the capacitance-frequency measurement, 
was (2.1 + 0.6) x 10’? cm™. On the other hand, hole-concentration 
measurements on this slice, by the automatic feedback method,” gave 
values of (1.9 + 0.4) x 10'7 cm™*.” Hence, again the capacitance- 
frequency method gave values consistent with those obtained by other 
methods. 

Stripe geometry lasers with deep-proton bombardment were also 
evaluated. In these devices the proton damage extends beyond the 
active region. Here, as opposed to the results for shallow bombardment, 
the capacitance remains small and nearly invariant with frequency. 
This again lends support to the model of RF current confinement in 
shallow-bombarded stripe geometry lasers. 


IV. CONCLUSION 


Experiments conducted on shallow proton-bombarded stripe geom- 
etry lasers show that the capacitance decreases abruptly above a 
certain frequency. These results are interpreted in terms of a distrib- 
uted parameter transmission line model. It is shown that the frequency 
beyond which the capacitance decreases depends on: the sheet resist- 
ance of the material above the active region, capacitance per unit area, 


Table I—Material evaluation obtained from capacitance frequency 


measurements 
Slice Average 
Slice r; ohms p ohms-cm da, pm Po oOhm-cm Po cm™? 
A (6 = 2) x 10° 0.37 + 0.12 0.6 0.2840.09 (2.5 40.9) x 10” 


B (15401) 10' 0.3640.03 0.224002 0.1840.02 (3.4+0.4) x 10” 
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and the dimension of the diode. These analytical expressions, when 
applied to experimental results, allow one to obtain an estimate of the 
material resistivity adjoining the active region. The deduced values of 
resistivity and doping concentration are in good agreement with the 
values obtained by other methods. 

Although the analytical results are applied to laser structures, they 
are also applicable to other optoelectronic devices. For instance, the 
same general conclusions can also be drawn about the small signal 
impedance of LEDs operating at high frequency. For configurations 
other than the simple stripe geometry, the spatial distribution of RF 
current obviously changes. However, the qualitative results are the 
same. 
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A commercially viable GaAs device technology for field-effect tran- 
sistors, integrated circuits, and lasers is critically dependent on the 
availability of high-quality, single-crystal boules with controlled di- 
ameter. We have modeled a diameter control scheme based on the 
monitoring of crystal weight for liquid-encapsulated Czochralski 
(LEC) growth of GaAs. The presence of the B2O3 liquid encapsulant 
and significant capillary forces make the direct interpretation of the 
weight-gain signal and its time derivative (DWGS) more complicated 
in comparison with pulled materials such as oxide crystals. We have 
formulated a realistic model for the LEC growth of axisymmetric 
crystals and have derived the differential equation relating the time 
evolution of the DWGS to radius and length. We show that the 
magnitude of the DWGS at the crystal’s “shoulder” is inversely 
related to the radius of curvature. Furthermore, the meniscus by itself 
gives rise to a precursor or early warning in the signal, which means 
that the maximum in DWGS precedes the maximum in shape by a 
few hundred seconds. The existence of a secondary maximum or 
aftershock in the signal that is the sole consequence of liquid encap- 
sulation is also demonstrated. Excellent agreement has been obtained 
between DWGS and the signal predicted from the measured shape of 
a grown crystal. Thus, prospects for automatic diameter control are 
encouraging. 


l. INTRODUCTION 


GaAs is one of the key semiconductor materials that serves as a 
substrate for light-emitting diodes (LEDs), lasers, and field-effect 
transistors (FETs). Paralleling the development pattern of growth 
techniques for other single crystals, the issues of primary interest have 
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evolved from questions of quality (elimination of twinning and defects, 
doping uniformity) to that of economy of size. However, scaling up the 
dimensions of GaAs crystals grown by the liquid-encapsulated Czo- 
chralski (LEC) technique necessitates introducing sophisticated 
schemes for diameter control. By achieving satisfactory control, it will 
also be possible to run for extended periods of time with minimal 
supervision, and to attain higher crystal yields, as well as a reduction 
in defect generation brought on by shape change. 

Since diameter control in Czochralski growth has been the subject 
of an excellent recent review by Hurle,’ here a brief outline of previous 
efforts with respect to LEC growth will suffice. Unlike Si? and a wide 
variety of oxide crystals® (e.g., GGG, LiTaO3) for which successful 
diameter measurement and control systems have been developed, the 
realization of a viable system for the LEC growth of III-V compounds 
has been much more difficult to achieve. This is largely due to effects 
associated with the growth chamber under high pressure, the presence 
of the encapsulating layer of B203(/), and the phenomenon of anoma- 
lous density (dhiquia > Asotia) In Some Semiconductor materials. 

A number of approaches to the control problem have been taken. 
Pruett and Lien,‘ and van Dijk et al.> have employed an X-ray beam 
passing through the high-pressure growth apparatus for GaP, designed 
to make the melt a high absorber of radiation relative to all other 
materials in the radiative path of the system. The subsequent use of 
an image intensifier gave an accurate television picture of the growth 
interface. Small changes in diameter could then be electronically 
extracted from the video signal and utilized for its control.° 

Alternatively, considerable success has been achieved meeting the 
demand for large, closely controlled diameter GaP single crystals using 
a floating die technique. Cole et al.° employed a SisN, ring floating in 
the GaP melt beneath the B2O;(/). This modified form of Stepanov 
ring®’ creates a long-term stable growth regime, permitting diameter 
tolerances of +1 mm for a 50-mm diameter boule. The technique has 
also been applied to GaAs with different degrees of success, apparently 
depending on the crystallographic growth direction.® 

In a third technique, advocated by Bardsley et al.,? a weight signal 
or a derivative weight-gain signal obtained by detecting the apparent 
weight change of either the crucible or crystal during Czochralski 
growth is used to control the power output and consequently the 
diameter. Bardsley et al. have considered both the theory’ and its 
implementation by an analogue servo-system.’° Although these au- 
thors have proposed reasonable initial postulates and expressions to 
calculate the weight gain, the detailed analysis is limited only to small 
perturbations in diameter and the formalism precludes growth by the 
LEC technique. 
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For some time now we have routinely measured the apparent weight 
of LEC-pulled GaAs crystals with a high sensitivity load cell placed in 
the high-pressure chamber in series with the crystal pull rod. The 
output of the cell is recorded as a dc level. Then, the derivative of the 
signal is taken electronically with a device developed especially for 
that purpose, and is also recorded. This activity has served as a useful 
qualitative guide in the manual control of diameter. The major objec- 
tive of this paper is to develop the fundamental theory that governs 
the relationship between the shape of an axisymmetric crystal with 
arbitrary variations in its diameter (including the shoulder) grown by 
the LEC technique and the instantaneously detected derivative 
weight-gain signal. Besides gaining new physical insight, we expect 
that by means of these investigations we can focus on the important 
physical factors and analytical techniques essential to the eventual 
control of crystal diameter. 

As a first step, a tractable model is formulated for LEC growth with 
a meniscus. The treatment leads to an explicit expression for the 
derivative weight-gain signal exclusively in terms of the crystal cross 
section and its first and second derivatives, in addition to geometrical 
and material parameters. Next, the numerical techniques required to 
perform computer simulations are outlined. Then, the key features of 
the signal are investigated by using the probability density function of 
the lognormal distribution for the crystal contour. Among the effects 
considered in detail are the shape of the shoulder and the presence or 
absence of the B2O3(7) encapsulant and meniscus. Furthermore, the 
derivative weight-gain signal is predicted for four contour lines—gen- 
erated by consecutive 90° axial rotations—of a specially prepared 
GaAs boule and compared with the signal measured during growth. 
Finally, in light of these modeling calculations, the prospective tech- 
niques for diameter control are discussed. 


ll. THEORY 


In this section, by a consideration of the relevant features of the 
LEC growth process, a realistic model is developed for the derivative 
weight-gain signal in a form suitable for computer analysis. To make 
the calculations tractable, but without a serious loss of generality, the 
following set of assumptions and constraints has been introduced: 

(t) The crystal and crucible are axisymmetric with circular cross 
sections. Hence, the crystal’s radius, r, can always be prescribed as a 
function of the axial location, L, which is monotonically related to the 
time elapsed since seeding. - 

(iz) The crucible containing the melt and the encapsulant is a 
right circular cylinder of radius, R. Less restrictively, departures from 


DIAMETER CONTROL OF GaAs 479 


a right cylinder are permitted below the liquid level prevailing at the 
completion of growth. 
(uit) The effects of crucible and crystal rotation and the concomi- 
tant viscous drag are neglected. 
(iv) As the initial melt level falls by distance / (Fig. 1), the entire 
amount of the liquid is transferred to the solid. 
(v) Seepage from the B2O3(/) encapsulating column of mass, me, 
and volume, Vz, to the melt/crucible interface (walls) is minute. 
(vt) The solid-liquid interface is planar. This allows the separation 
of the associated heat transfer problem from the calculations. 
(vit) Capillary forces between the melt, B2O3(7), and solid result in 
a meniscus of height, h, with a subtended angle between the vertical 


INITIAL 
MELT LEVEL 


U 






~~B203(1) 


~~B203(1) 





(b) 


Fig. 1—(a) Schematic geometry of LEC pulling. (b) Enlarged view of the solid-liquid 
interface region. 
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and the tangent to the “skirt” of the meniscus (joining angle), a (Fig. 
1). ; 

(viz) As the crystal is pulled at constant rate p, contact with the 
melt is uninterrupted. 

(tx) Physical constants required in the calculations are average 
values representative near the melting point of the crystal. 

The preceding conditions in combination with the geometrical de- 
scription of LEC pulling in Fig. 1 and Archimedes’ principle permit 
writing the weight signal, W,, corresponding to detected crystal mass, 
m,;, aS the sum 


W,= msg = WwW, -W, +W3; +W, —-W; 
true buoyancy vertical weight of the buoyancy 
weight correction projection cylindrical _ correction to (1) 
to W, of capillary meniscus Ws + Ws 
force 
where 
W, = (m. + mg) g = mg | (2a) 
W2 _ V.ppg (2b) 
Ws = 2aro cos a (2c) 
W, = r*ahp-g (2d) 
W; = VinenPB (2e) 


The symbols m, V, and p without subscripts designate the mass, 
volume, and density of the growing crystal, respectively. The same 
letters with subscripts B, 4 e, and g refer to the B.O3 encapsulant, 
melt, and that portion of the crystal that resides either in the encap- 
sulant or the gas phase, respectively. The symbols o and Vinen represent 
the surface tension and meniscus volume, respectively. 

Since V, = (m — mg)/p, with the abbreviation ki = ps/p, Wi — W2 
in eqs. (2a) and (2b) can be transformed into 


Wi — Wo = m(1— kidg + kimgeg. (3) 


Then, differentiating eqs. (3), (2c), (2d), and 2e with respect to time, f, 
and using eq. (1), we obtain the derivative weight-gain signal (DWGS) 
in the form 








dwW./g dm dm, 27 dr 
Shi a8 a & 
it ( hi) at 1 + 2 0 COS a 77 
20 ‘ da dr 2 dh AVmen 
— 20 yin a SE + pe rah +r?) — pp a (4) 


It can be readily shown that dm/dt is a function of the other 
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variables. Clearly, if the crystal is axisymmetric, one has 
dm , aL 


a ae’ (5) 


Moreover, the continuity of growth provides the relation 
L=¢+pt—h, (6) 
or in derivative form 


aL d¢ dh 


a a ae ” 


In addition, the conservation of matter during crystallization as the 
liquid level falls leads to 





m= pemR*e— Vinen) (8) 
or 
dm _ 2 de _ AVinen 
ects . 9 
dt oe( me a dt we 


A combination of eqs. (5), (7), and (9) permits the elimination of 
d¢/dt; hence eq. (5) can be rewritten as 
dm ‘ 


kdVinen 9 2 dh 
Pape dt 


dt i dh : 


(10) 


where k = p,/p and R? = kR? — r’. 
Substituting eq. (10) into eq. (4) for the DWGS yields after several 
algebraic operations 








dW./g _ _ 2 dm Pal — ki)zp-r*R _ 2n0 hte da 
dt ‘dt dt R? zg dt 
AV nen p dh R? 
P a E = hi) pers o| Se rin ae Pe E — (1 — ki) z| 


wit 
+2 7s *| rtp. © - cos «| (11) 


In essence, the general form of the DWGS is given by eq. (11). Since 
all the variables directly depend on L or indirectly through r on JL, it 
is convenient to replace the derivatives d/dt by d/dL-dL/dt. Accord- 
ingly, we have 
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dW./g = dmz dL Pil — ki)np,r°R? _ 20 aoe da dL 
dt al da R? g dL dt 
dV men AL r? dh dL R? 
dL dt ja haber R :— po| Ota p-[1- =e) a 
dr dL 
us 12 
+ 20 aL de (rior © 3 cos «) (12) 


Equation (12) completely describes the DWGS once the few remaining 
unknown functions are evaluated. These are the axial derivative of the 
crystal mass outside the B.O3(¢4)(dm,/dL), the macroscopic growth 
rate (dL/dt), and the meniscus (h, dh/dL, Vmen, and AVmen/dL) in 
terms of r(L) and the required physical and geometrical constants. 
The crystal shape r(L) we consider here as an input signal and the 
choice of different functional forms will be postponed to the next 
section. 


2.1 Evaluation of dm,/dL 


From Fig. 1 we conclude that the total cylindrical volume occupied 
by the segment of the growing crystal in V., the encapsulant, and the 
meniscus are conserved. Thus we can write the integral relation 


L By 
| r’adL _ | r’adL + Vision + Va = aR?(L = Lg + h), (13) 


0 0 


where the two integrals on the left-hand side are equal to 


L 
| r?(L)adL. 


Lg 


Thus we can express m,/p as 


Ly L 
mz/p = i] r’adL = | r°adL + Vinen + Va — tR*>(L — Lg t+ h). 
0 0 


Differentiating the above equation with respect to L gives 


dint «jog 4 Wore ape (14% ale) yg 
dL dL 
To obtain dL,/dL we define a function ¢(L, Lz) = 0 by subtracting 
the right-hand side from both sides of eq. (13). Then, employing the 
well-known result for the differentiation of implicit functions we have 
dL, _ _-g/aL 
dL ~— ag/aL,’ 





(15) 
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where 





a6, AV. en. ah 
ap aL) + —" — oR? (1 + 5 


and 


ao 

aL, = —r?(Lg)t + aR’. 
The actual value of ZL, can only be determined by a numerical proce- 
dure. From eq. (13) we get 


L oi Lg 
{ r°adL + sides + Vines i r-dL 
PB 
os —(L+h) oa ny oe (16) 


where the left-hand side includes only known quantities. 

While the entire crystal is submerged in B2O3, mz, = 0 and 
dm,/dL = 0. In that case the counterpart of the volume balance in eq. 
(13) is of the form 


L 
| r’adL + Vp + Vinen = TR*(L +h + A), (17) 
0 


where A is the distance of the tip of the crystal from the B203(/) — gas 
interface. Rearranging eq. (17) yields 


L 
adh + + Viren 
0 PB 
~____,—_- (L+h) =A. (18) 


Comparing eqs. (16) and (18) we observe that the left-hand sides are 
identical. As the crystal protrudes through the encapsulant, A changes 
from positive [eq. (18)] to negative [eq. (16)]. This provides an impor- 
tant clue in the computation of Z,. The numerical solution of eq. (16) 
for Lz, starts when A = 0. 


2.2 Determination of the macroscopic growth rate, dL/dt 
The macroscopic growth rate given in eq. (7) can be rewritten as 
dL d¢dL_ dhdL 


de dae: aba 7 (19) 


Hence, by a rearrangement of eq. (19) dL/dt becomes 
dL Pp 


Se. 2 
dt 1-—d¢/dL + dh/dL ee 
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To eliminate the melt coordinate / we use the relation for the conser- 
vation of melt [eq. (8)], which gives 


L 
Vv “| pr "dL + peVmen 
m + P¢Vmen 0 
£= —_—_—————— = —________—_—., 21 
pe7R* pe7tR? ( ) 
Differentiating eq. (21) with respect to L yields 
dé _ r?— AVnen/dL 
dL kR? 7R* ~ 
Finally, introducing eq. (22) into eq. (20) gives the macroscopic growth 
rate in the form 


Es ns (Ps 
dt r?— dVimen »\ dh 
1- (e+ dL / ne) + 


2.3 Transformation of the crystal length to time coordinate 





(22) 


(23) 





In general, r is known as a function of L. On the other hand the 
DWGS is recorded as a function of time. Therefore, a transformation 
from L to ¢ is necessary in the analysis. From the continuity condition 
[eq. (6)] ¢ can be expressed as 


ez L+h(L) — &L) 
7 : 
Substituting eq. (21) into eq. (24) yields 


L 
| r°dL 
Vinen 
ns ee ee! /. (25) 


kR? aR? 


(24) 


2.4 Meniscus height and related properties 


Mika and Uelhoff* have determined the meniscus shape and inter- 
face height, h, occurring during Czochralski growth by a numerical 
solution of the Euler-Laplace differential equation. Although, in prin- 
ciple, numerical results for the meniscus could be generated concur- 
rently with the computation of the DWGS, here we prefer to rely on 
faster closed form solutions of the capillary equation. Fortunately, as 
shown by Mika and Uelhoff,” the analytical approximation of Tsivin- 
skii’* describes the exact values of h with excellent precision over a 
wide range of joining angles, a, and radii. We have verified that for 
a = -—10° and r= 0.5 cm the error in using Tsivinskii’s approximation 
is less than ~2 percent. Subsequent computer simulations have dem- 
onstrated that in practical situations, except when there is a very rapid 
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drop in radius, the stated limit on a is obeyed. The fact that for a short 
time after seeding r is less than 0.5 cm is of minor importance owing 
to the initial insensitivity of the DWGS to h. 

Therefore, Tsivinskii’s equations” for the interface height and shape 
will provide the basis of our DWGS analysis. In our notation Tsivin- 
skii’s equation for h is of the form 


h=A[(1 —sina + uv’)? — u], (26) 
where 


_ A cos a 
4r(L) 


and A is a constant. According to Egorov, Tsivinskii and Zatulovskii’s 
modification for LEC growth of Tsivinskii’s original work A is given 


by” 
ee ee (27) 
(pz — pa)g 


The joining angle a is the sum of the growing angle, ¢, and contact 
(wetting) angle, B (Fig. 1), i.e., 


a=d+B. (28) 


In general, £ is a constant, while the growing angle is a function of the 
derivative dr/dL according to® 





tan ¢ = a ¢@ = tan™ oe (29) 


iL dL’ 
Thus, h [eq. (26)] can be expressed in terms of r and dr/dL at the 
solid-liquid interface. 

Consequently, the required first derivative of h, dh/dL, includes the 
second derivative d’r/dL’. This can be readily shown by the substi- 
tution of the combined eqs. (28) and (29) into eq. (26). Then, using 
standard trigonometric relations for angle-sums and inverse functions 


and differentiating the expression thus obtained with respect to L we 
find 


dh A uv —2X 
b+ (Zz) | 


where 
1 d?r/dL? dr. 
Fr (ay ne ae 


aL 
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2 2 
4 ae — (5 cos 8 + sin é) 
r 
1+ (= ar) 


dL 
ot dr . 
+a (cos B- aL — sin 8) 


Another derivative of interest in eq. (12) is da/dL. From eqs. (28) 

and (29) we have 
da _ d(tan™'dr/dL + B) = 1 d°’r (31) 
dL dL 1 + (dr/dL)? dL?” 

A property closely related to the interface height is the volume of 
the skirt-shaped meniscus, Vmen (Fig. 1), and its axial derivative dVinen/ 
dL. The quantity Vinen can only be evaluated by a numerical technique. 
Egorov et al.’* provide the x, y coordinates of the meniscus in the form 
of the integral 


Q 
= | ———dy+7r(D), 32 


where 
A 2 


Owing to axial symmetry summing up narrow segments of area x*z 
provides the volume of the meniscus as 


Qzsina- (7 +S) (9 2_ 72), 


yi=h-6 


Vinen = 75 > x*(yi), (33) 
y;=0 


where 6 is the thickness of a segment. 

Apart from performing numerical differentiation, no simple method 
exists for the evaluation of dVmen/dL. If the shape of the meniscus is 
that of a right cylinder, then Vien and dVinen/@L are given by 





Vey = r°ah 
and 
Ae _ dr dh 


Simulations indicate that the numerical derivative dVimen/dL is closely 
approximated by 


d Viren ay Ven aVeyi 
dL Vey, dL ° 








(35) 
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lil, RESULTS AND DISCUSSION 
3.1 Computing methods 


We have succeeded in reducing the theoretical determination of the 
DWGS to a differential equation in r, dr/dL and d’r/dL?. Therefore, 
if the shape of the crystal is given, the DWGS is calculable; conversely, 
the DWGS completely defines the radius of the crystal at any moment 
of interest. Here, we address ourselves to the first part of the problem, 
though the eventual objective is diameter control emphasizing the 
DWGS. 

The program to evaluate the DWGS was written in HP Basic. A 
key feature in the efficient organization of the computations is the use 
of subroutines for dm,/dL [eq. (14)], dL,/dL [eq. (15)], dL /dt [eq. 
(23)], A [eq. (26)], dh/dL [eq. (30)], da/dL [eq. (31)], Ven [eq. (33)], 
AVmen/dL [eq. (35)], as well as for r(Z) and its first and second 
derivatives. 

Numerical integration was necessary to obtain Vinen and the crystal 
volumes between O and L and O and Lg. It was found advantageous 
to evaluate the crystal volumes in the main routine because during the 
simulated growth of a crystal incrementing LZ in steps of 0.1 cm, the 
accumulating volume elements could be retained in memory. 

As mentioned earlier, the quantity Lz is only required after the seed 
emerges from the B203(7) encapsulant. Mathematically speaking, this 
occurs when the left-hand side of eq. (16) or (18) becomes negative. To 
determine L,, at each increment of L the left-hand side of eq. (16) is 
evaluated. Then, a linear search is performed, substituting trial values 
of Lz into the right-hand side of eq. (16) in the range Lz — 0.2 to Lz 
+ 0.4 cm in steps of 0.01 cm, where L, is the previous solution to eq. 
(16) at L — 0.1 cm. In case the 0.01-cm mesh around Lz was too coarse 
to yield a solution, an option with 0.001-cm increments was available. 

There are several mathematical tools available to describe the shape 
of the grown crystal, r(L). From the actual cross section a large number 
of r, L coordinates can be generated in tabular form. Then, one obvious 
choice is to employ an interpretation scheme based on point-by-point 
tabular values of 7(Z) in addition to numerical first and second deriv- 
atives. A more sophisticated alternative is to determine the Fourier 
coefficients of the crystal shape. However, we have learned by expe- 
rience that the most efficient and satisfactory scheme to convert the 
tabular data into functional form is by using cubic spline regression. 

A cubic spline function is a piecewise polynomial of degree three 
that joins adjacent polynomials in “knots.” At the knots the functional 
values as well as the first and second derivatives are continuous. 
Ordinarily, a spline passes through all the points within the interval 
bounded by two knots. In contrast, in spline regression a cubic least- 
square fit for the same points is obtained. We have adapted, for the 
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present application, Wold’s spline regression technique in terms of B- 
splines.” 

Fortunately, the key features of the DWGS can be illuminated 
without invoking advanced data-fitting techniques. A judiciously cho- 
sen elementary transcendental function representing r(L), which has 
n continuous derivatives, will be found suitable to illustrate the effect 
of various factors governing the course of the DWGS. 


3.2 Major influences on the derivative weight-gain signal 
3.2.1 Crystal shape 


Typical LEC crystals, including GaAs, exhibit a pronounced 
“shoulder” as the radius rapidly expands from the seed. Boule-to-boule 
variation in the rate of approach to and radius of curvature of the 
shoulder is not uncommon. A mathematical description of the shoul- 
der, in general, and a prototype crystal, in particular, is possible by 
means of the probability density function of the lognormal distribu- 
tion.” Accordingly, we can write for a seed diameter of 0.6 cm 


1 (In? /Lm ’ 
exp -———, | + 0.3. 
sLV2a 2s? 


The median length, Z,,, and the constant s are chosen such that the 
maximum r is 2.8 cm at L = 2.2 cm. From these conditions we have 


Lm = 2.2 exp(s’) 


and a lengthy expression for C that is also only dependent on the 
standard deviation, s. Hence, the function r(L) can be constructed if 
s is given. 

In Figs. 2, 3, and 4 we present the crystal shape r(L) for s = 0.4, 0.8, 
and 1.2, respectively. It can be seen that with decreasing s the radius 
of curvature at the maximum decreases (i.e., |d’r/dL”| increases). 
Furthermore, it can be shown that the rate of approach to the shoulder 
(dr/dL) rapidly rises as s diminishes. 

Substituting eq. (86) and its first and second derivatives into the 
DWGS [egq. (12)] and the subsidiary equations yields the signal and 
the crystal cross section as a function of time, both of which are shown 
in Figs. 2, 3, and 4. The required parametric values are listed in Table 
I. The results demonstrate that the DWGS is extremely sensitive to 
dr/dL and |d?r/dL|. In fact, the steeper the rise in the shape, the 
sharper the peak in the DWGS and the larger its absolute magnitude. 

Two other characteristic features of LEC pulling that are of crucial 
importance can also be observed in Figs. 2, 3, and 4. First, the maximum 
in DWGS precedes in time the maximum in shape by a few hundred 
seconds. This time lag we shall designate as the “precursor.” The 
second property is the additional maximum in DWGS at a time when 


(36) 





r(L) = | 
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Fig. 2—The DWGS and crystal cross section as a function of time for LEC pulling. 
The shape versus length curve (---) is based on eq. (36), setting s = 0.4. The time 
derivative of the unencapsulated weight [dm,/dt, eqs. (14) and (19)] is also shown. 
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Fig. 3—The DWGS and crystal cross section as a function of time for LEC pulling. 
The shape versus length curve (---) is based on eq. (36), setting s = 0.8. The pull rate 
ae in eq. (12)—the factor usually employed in conventional diameter control—is also 
shown. 


the crystal’s radius has already declined. We shall refer to this phe- 
nomenon as the “aftershock.” 

We have investigated the source of the precursor and aftershock. 
Simulations show that neither r nor its derivatives bears any respon- 
sibility. Examining the variation of the interface height with time (Fig. 
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Fig.4—The DWGS and crystal cross section as a function of time for LEC pulling. 


The shape versus length (---) curve is based on eq. (36), setting s = 1.2. The meniscus 
height [eq. (26)] is also shown. 


Table I—Parameters for the calculation of the 


DWGS 
Density of crystalline GaAs, d[g/cm*] 5.17 
Density of molten GaAs, d{g/cm”] 5.71 
Density of B203(¢), ds[g/cm*] 1.55 
Mass of B,0,(¢), ma[g] 129.5 
Pull rate, p[cm/s] 3.6 x 107* 
Crucible radius, R[cm] 3.9 
Wetting angle, 8[deg] 15 
Capillary constant, A[cm] 0.4 
Surface tension, o[dyne/cm] 333 


4) we note that starting with a small but finite value at the seed, h 
saturates beyond the shoulder (h = 0.38 cm).* Hence, no clue can be 
extracted from the form of h. Concentrating on the term including the 
pull rate in eq. (12), one concludes that this conventional description 
of the DWGS?* holds quite well in the early phase of growth but departs 
from reality near the shoulder and beyond (Fig. 3). The maximum in 
the p term and the shoulder perfectly coincide and no secondary 
maximum appears. 

If, however, one examines the derivative dm,/dt [eqs. (14) and (23), 


* The fact that at t = 0, r ¥ 0, thus # > 0 is finite contradicts eq. (24). Therefore, to 
correct eq. (25) the residual time 


Vinen(L = 0) 
b= [aw =o) — tel 


is always subtracted from p. 
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Fig. (2)], it is found that this quantity rises from zero to a maximum 
value precisely at the time of the aftershock and that the magnitude 
of the DWGS in this regime is essentially dmz/dt. In view of the fact 
that dm,/dt is specifically associated with B2O3(/) encapsulation, a 
reasonable hypothesis is that the aftershock is a consequence of the 
LEC technique. 


3.2.2 Meniscus shape and liquid encapsulation 


To isolate the factors leading to the precursor and aftershock we 
have evaluated the DWGS corresponding to the lognormal r(L) [eq. 
(36), s = 0.8] in the absence of liquid encapsulation. In this case the 
major equations still hold provided pz, Vz, ms, mg, and dm,/dt are 
taken as zero. In Fig. 5 we present the DWGS with and without 
interposing a meniscus between the melt and the growing crystal. It is 
immediately obvious that in both instances the aftershock disappears 
from the DWGS. Moreover, if the meniscus is also removed, the 
precursor is missing, i.e., the times to reach the peak in DWGS and 
cross section exactly coincide. In fact by simultaneously eliminating 
B203(¢) and the meniscus from the growth system, only the contribu- 
tion of the conventional pull rate term? is retained in eq. (12). 
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Fig.5—The DWGS and crystal cross section as a function of time in normal Czo- 
chralski pulling without encapsulation. Curves with and without a meniscus present are 
given. The shape versus length curve (—--) is based on eq. (36), setting s = 0.8. When 
me meniscus is absent, the pull rate term in eq. (12) exactly coincides with the curve 
shown. 
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Further confirmation of these effects is offered in Fig. 6. Here the 
calculation of the DWGS is repeated for LEC pulling without a 
meniscus. As we expected, though the precursor is missing the after- 
shock reappears. Apparently, up to and over the shoulder the DWGS 
is governed by the p term. Near the secondary maximum the derivative 
dm,/dt becomes the dominant factor. 

A more physical interpretation of the aftershock is possible by 
plotting the time-dependent radius, r(Lz), of the crystal as it just 
protrudes through the B2O3(/). In Fig. 6 we show both r(L) and r(Lz) 
as a function of time. Besides an expected displacement of the radius 
along the time axis one notes that the maximum in r(L,) occurs exactly 
at the time of the secondary maximum in DWGS. From these consid- 
erations a surprisingly simple explanation emerges. As more of the 
shoulder region becomes uncovered from the encapsulant, the DWGS 
registers the rapidly increasing true weight gain as opposed to the 
previously measured apparent (buoyancy-reduced) weight gain of the 
crystal. Hence, at the time of the maximum aftershock both the freshly 
grown layers at L, submerged in B2O3(/), and the sizable exposed 
portions at L, are detected. 

The discovery of the precursor and the aftershock has a significant 
bearing on the diameter control of GaAs. The precursor is an early 
warning signal that predicts a maximum in radius a few hundred 
seconds before the actual event. In other words the DWGs is already 
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Fig. 6—The DWGS and crystal cross section as a function of time in LEC pulling, 
free of a meniscus. The radius is shown as a function of time at the solid-liquid interface, 
I, and at the top of the B03 layer, Lz. The pull rate term is also shown. The shape 
versus length curve (---) is based on eq. (36), setting s = 0.8. 
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dropping while the diameter is still increasing. Thus the controller has 
sufficient time to react to an unwanted change. 

When the aftershock is observed there is a good chance that the 
controller misinterprets it as an unexpected gain in diameter and, if 
undesirable, will respond accordingly. Then, of course, a visible reduc- 
tion in diameter will occur. By the time this is noticed, corrective 
action taken cannot restore the loss in diameter control. At best, 
beyond the shoulder a “sinusoidal” cross section with small amplitude 
results. 

In view of these observations one cannot base the diameter control 
algorithm for GaAs on the customary p term’ in eq. (12). To be sure, 
from the seed up to near the shoulder perhaps it suffices. However, for 
the bulk of the boule, control by means of such a traditional method 
is clearly inadequate. 


3.3 Comparison with experiment 


We have extracted the salient features of the DWGS arising during 
LEC pulling by the use of the lognormal probability function for the 
crystal’s cross section. To compare the experimentally determined 
DWGS with the theoretical one the idealized crystal shape must be 
replaced by that of a GaAs boule grown by the LEC method under 
closely controlled conditions. Some of the growth parameters of inter- 
est for this specially prepared crystal are listed in Table I. The weight 
and length of the crystal are 523.3 grams and 6.6 cm, respectively. 

The crystal coordinates (r versus L) can be derived from a two- 
dimensional projection employing a digitizer. Owing to the eccentricity 
of the actual (100) boule, a single view is insufficient. We have obtained 
planar projections with sharp contours by a photographic technique. 
The crystal was placed on the top of a light box, backlit, and photo- 
graphed using a high-contrast film. Then, the procedure was repeated 
following a 90-degree rotation around the axis. In this manner, the two 
photos provided a total of four cross sections. Digitizing the shape in 
0.5-mm intervals 133 pairs of (7, L) coordinates were collected for each 
of the four contours. At the shoulder the maximum error between the 
crystal and its projection is less than 0.5 percent. The computed mean 
weight is 528.6 g with a standard deviation of +9.3 g, which should be 
compared with the measured weight of 523.3 g. Clearly, an excellent 
tabular description of the crystal’s cross section has been accomplished. 
The data points for one of the four views are shown in Fig. 7. 

The alternatives to represent the tabular data have been outlined 
earlier. In principle, because of close spacing, there would be no 
obstacle to a simple interpolation of the r(Z) values. However, as we 
illustrated in Fig. 7, using the numerical first and second derivatives 
may be problematic on account of noise. Indeed, when the DWGS was 
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Fig. 7—One of the four experimental contours of a GaAs crystal and its cubic spline- 
regression representation (—). The cubic spline regression of the numerical first deriv- 
ative dr/dL is also shown. In addition, the numerical second derivative d’r/dL? is given, 
together with the quadratic expression derived from the cubic spline of dr/dL. 


computed from the numerical derivatives severe discontinuities ap- 
peared in the results. This suggests that the steep “jumps” in d?r/dL” 
seriously affect at some locations the shape and magnitude of the 
DWGS. 

A different approach is the finite Fourier transform of the r(L) 
points. However, owing to slow convergence a large number of terms 
is required. Eight terms were found to be sufficient to describe the 
seed and shoulder regions. Beyond the shoulder 18 terms were neces- 
sary to fit the data, albeit with concomitant high-frequency noise. 
Since the number of required terms was unpredictable and the com- 
putation inefficient, Fourier representation of the r(L) data was aban- 
doned. Nonetheless, the DWGS based on Fourier analysis provided a 
reasonable description of the measurements. 

Cubic spline regression’ has proven to be the most promising 
method to cast the tabular data on crystal cross sections into functional 
form. At the middle of the axis, the crystal was separated into two 
slightly overlapping domains and within each 12 equally spaced knots 
were positioned. The outstanding spline fit to one of the contours is 
given in Fig. 7. There are two avenues open to obtain the first 
derivative dr/dL. One is a simple differentiation of the shape’s spline, 
which results in a continuous quadratic representation. Unfortunately, 
in this case the second derivative becomes a continuous network of 
connected straight lines (the third derivative is discontinuous) leading 
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to a jagged DWGS. A more fruitful approach is to fit the numerical 
first derivative itself by a cubic spline. Then, the noisy second deriva- 
tive is described by a quadratic expression. In Fig. 7 we show the cubic 
spline fit to the numerical first derivative data as well as to the 
quadratic function passing through the numerical d?r/dL?. We note 
that a reasonably good description of the derivatives has been obtained 
by the spline regression technique. Therefore, all four contours were 
treated in a like manner. 

Having thus established the shape and shape derivatives for an 
actual GaAs crystal, one can readily evaluate the corresponding DWGS 
by means of the analytical tools described earlier. Among the input 
parameters listed in Table I, all the densities were obtained from a 
recent critical evaluation of Jordan.”® The selected contact angle (15 
degrees) is consistent with theoretical and experimental findings on 
Ge and Si.'”'*"° The capillary constant A and the surface tension, o, 
are connected via eq. (27). Though the recommended value for A by 
Egorov et al. is 0.48 cm,” we have achieved a slightly better fit to the 
experimental DWGS using A = 0.4 cm, which is equivalent to o = 333 
dynes/cm. The surface tension of GaAs thus obtained is consistent 
with that for other III-V compounds (InP”’ and GaSb”)Si and Ge.” 

The gross appearance of the DWGS is not overly sensitive to a 
change in A. Nonetheless, a detailed examination reveals that the peak 
corresponding to the shoulder becomes higher and broader as A 
increases from 0.28 to 0.48 cm. With respect to the rest of the DWGS 
one can conclude that the peak-to-valley dynamic range diminishes as 
A drops. Among the effects of the other parameters, we have observed 
a rise in DWGS with an increase in p and a reduction of the time scale 
with a decrease in crucible radius. 

In Fig. 8 we have reproduced the experimental derivative weight- 
gain signal as a function of time, as obtained on a carefully calibrated 
strip-chart recorder during the LEC growth of GaAs. The predicted 
DWGSs for the four contour lines as well as the time-dependent cross 
sections are also shown. Clearly the theoretical curves envelop the 
measured values, providing an excellent overall description. 

The computed DWGSs exhibit both the early warning precursor 
and aftershock. In the inflection before the first peak we can discern 
the time when the seed first protrudes through the B203(/). Since an 
actual crystal may possess additional cross-sectional bulges beside the 
shoulder, the secondary maximum in DWGS reflects both the memory 
of that earlier maximum in radius and the current expansion. To a 
small extent the DWGS is influenced by the input shape derivative 
function. For example, in the trough above the shoulder, a spline of 
the first derivative with many more knots would lead to a smoother 
transition to the next peak. Likewise, near the last peak, an improved 
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Fig. 8—DWGS (—) and crystal cross section as a function of time for four experi- 
mental contour lines of LEC-pulled GaAs. The measured DWGS was traced from the 
original strip-chart recording. 


agreement with the experimental data is possible at the cost of a finer 
derivative spline. Near the bottom of the crystal the accord with the 
measured DWGS is more limited, perhaps on account of the increas- 
ingly negative turn in the joining angle a (between —20 and —40 
degrees). Under those circumstances Tsivinskii’s approximation for 
the meniscus properties becomes inaccurate.” 


3.4 Prospects for diameter control 


The excellent agreement between predicted and experimental 
DWGS strongly suggests that diameter control of GaAs by the deriv- 
ative measurement is feasible. There are two approaches to diameter 
control implied by the modeling calculations. One is a real-time in- 
stantaneous determination of the radius during growth from the 
DWGS by solving the differential equation (12). Then, depending on 
whether r is increasing or decreasing from a preset limit, the RF input 
power is changed by a suitable amount. 

The other method is the construction of an “ideal” crystal with 
appropriate tolerances on a drawing board. Taking into account the 
tolerances for this crystal, a band of DWGS can be evaluated as has 
been done for Fig. 8. Subsequently, the control of diameter is reduced 
to the problem of adjusting the power input in such a fashion that a 
template established by the band of DWGSs is followed. 
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In an effort to free telephone traffic theory from some of its depend- 
ence on independence assumptions, and to reap some benefit from its 
traditional state equations, a systematic search is made to find 
relationships between load, loss, size, structure, and other network 
parameters that are simple, universal, and informative. Three prin- 
cipal topics are covered: 

(i) A load-loss-size formula, linking some half-dozen network 
parameters by a rational function, and used repeatedly to give 
(it) Lower bounds on the number X of crosspoints in networks 

(iti) Asymptotic results about blocking, growth, and complexity of 
selected network structures in passing from finite to “infinite” sources 
at constant load. 

The major results in (ii) imply that for all practical networks on N 
terminals, the crosspoint count X must grow like N log N, te., 
incurring loss by restricting access or concentrating cannot avoid the 
N log N growth rate known to be exacted by nonblocking networks. 
The chief result under (iii) is that as a constant load is spread over 
N terminals, then the number X of crosspoints needed to keep loss 
less than € > 0 need grow only linearly with N, at a rate dependent 
on ¢, while the usage (erlangs carried per terminal) goes to zero. 


|. INTRODUCTION 


The relationships between traffic carried and traffic lost, between 
load and loss, have always been at the center of interest in telephone 
traffic theory. Since the time of Erlang,’ over fifty years ago, the 
principal problems of traffic theory have been analytical: to predict 
mathematically, from the structure and mode of use of a switching or 
connecting network, and from the assumed stochastic behavior of the 
customers, how much traffic the network will carry on the average, 
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and how much it will lose as a result of blocking, overload, suboptimal 
routing, or incomplete searches for paths. As telephone networks have 
become larger, two more design parameters of interest have emerged 
and now command attention: the size of the network as measured by 
the number of crosspoints, and its complexity as measured, for exam- 
ple, by the number of stages of switching it has. 

The probabilistic principle that it is very unlikely for more than a 
moderate number of customers to want to talk simultaneously has 
been the theoretical basis of traffic theory since its start. We can view 
it as an unrefined analog of the principle in information theory that 
separates a relatively small class of events that exhaust most of the 
probability from a remaining large class of very unlikely events. This 
principle has led quite naturally to the use of concentrators, and of 
networks in which blocking, mismatch, and overflow all can and do 
occur as it were by design. It is a function of traffic theory to articulate 
this principle in mathematical models for operating telephone net- 
works, and to use such models to examine its implications for the 
growth and complexity of networks, as well as their loads and losses. 

Even for the simplest stochastic models, progress with these tasks 
and problems has been very slow because of the combinatorial com- 
plexity of the network, the very large number of network states, and 
the lack of approximate methods. Thus, it is particularly important to 
find relationships between load, loss, size, and other network param- 
eters that are simple, universal, and useful, even for very large net- 
works. They should be simple in, for example, not requiring solution of 
very high-order systems of equations, universal in being relatively 
independent of network structure, and useful in providing inequalities, 
estimates of performance, and information about the growth of cost 
and complexity with network size. 

In this paper we try systematically to sketch out some of these 
relationships and associated ideas. The results are of necessity spotty, 
and no claim is made of completeness or originality, only of rigor. 
Three principal topics are taken up here: (z) a load-loss formula, linking 
some half-dozen network and performance parameters by a rational 
function; (ii) lower bounds on the number of crosspoints in a network; 
(iii) asymptotic results about blocking, growth, and complexity of 
selected network structures in the limit of passage from finite sources 
to Poisson arrivals, with total offered traffic held constant. 


ll. SUMMARY 


The organization of the sequel is as follows: By way of some 
background, we start with discussions of blocking, loss, concentration, 
etc., and of their relation to the basic principles of telephone traffic 
theory and engineering. After various preliminary sections on model- 


500 THE BELL SYSTEM TECHNICAL JOURNAL, FEBRUARY 1983 


ling, we call attention to a (known) generalized Erlang formula that 
connects some of the important parameters of an operating network. 
We note its technical consequences, and use them repeatedly in the 
rest of the paper. 

Next we take up the problem of the growth of the number of 
crosspoints with the number N of terminals. The question we try to 
answer is this: When can the N log N order of growth necessary for 
nonblocking networks be reduced by allowing a fixed, small probability 
of blocking, using, for example, concentrators, or other forms of incom- 
plete access? The answer is that it cannot, unless we consider a 
familiar, special kind of low traffic limit in which line usage vanishes. 
It is shown, quite generally, that networks arranged in stages must 
grow like N log N if certain (very reasonable and mild) traffic, access, 
and symmetry conditions are met. This result, similar to known results 
for nonblocking networks, implies that neither judicious concentration 
nor a nonzero loss can lower the order of growth from the N log N 
exacted by the nonblocking case. 

The final sections describe various large networks with a simple 
structure vis-a-vis blocking; their loss probability can be calculated 
exactly in spite of the astronomical number of states. These networks 
are based on such structures as trunk groups, frames, and remote 
concentrators, all familiar to the traffic engineers. We are interested in 
studying these exact solutions as we let the network grow while keeping 
the total traffic constant; this kind of growth amounts to adding more 
and more customers, each of whom contributes less and less traffic, 
and results in a passage from finite to infinite sources (a Poisson 
process of arrivals) at constant offered load. The blocking formulas 
can be studied in this limit, and they lead to close connections with 
the classical Erlang function, E(c, a). As an application, we can give 
methods of synthesizing very large networks with prescribed blocking 
probabilities. In particular, as a constant offered load is spread over 
more and more customers, the number of crosspoints sufficient to 
achieve less than € in blocking need grow only linearly with the number 
of customers. 

There is a bibliography of background reading and related work 
following the references. 


ll. STATEMENT OF RESULTS 


3.1 Generalized Erlang formula 

Using some standard “dynamic” assumptions” to describe random 
traffic, we show that half-a-dozen parameters, all characteristic of 
network size and performance, are related by a simple, rational func- 
tion. These parameters are: 
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N = number of terminals on a side of a two-sided network 
= number of inlets = number of outlets 
m = (mean) carried load = equilibrium average number of calls in 
progress 
A = calling rate per pair of idle customers 
bl = probability of blocking, from the “wire-chief’s” point of view 
o” = variance of number of calls in progress, 


and the formula states that,” very simply, 
A m 
~ (N= m)? + 0? 


For some purposes the parameters 


1-01 (1) 


p = m/N = line usage = erlangs carried per inlet (outlet) 
a = AN’ = total offered load (when everyone is idle) 


are more significant or convenient, and using them the formula is 
recast as 


es Loa 
(l—p)?+ oN 


Indeed, there are many ways of twisting and inverting the basic 
formula, each one illuminating some special aspect; several such will 
appear later. It can be shown that if N — © while a = AN? remains 
constant, then p and o’N~ go to zero, and we have Erlang’s original 
result, the “tautology” 


a(1 — bl) = 


a(l—bl)=m 
or 
: load lost 
Rene 5 cad offered” 


The formula (1), really a generalization of Erlang’s loss formula, is 
useful for the following applications: 
(t) Order of magnitude estimates 
(tt) Asymptotic analyses for large networks, with or without a 
passage from finite to infinite sources 
(tit) Bounds on the number of switches in a network 
(tv) Growth and complexity bounds for networks with given load 
and loss constraints 
(v) Synthesis of networks having prescribed parameters. 
All these applications are illustrated in the text that follows. 


3.2 Concentration 


The principle that high occupancy states are very unlikely has 
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suggested the use of concentration, and it is pertinent to assess the 
effect and value of concentration in large networks. Some results of 
this kind are in the next three subsections, and they warrant these 
conclusions: 

(t) Practical networks must grow like constant N log N, whether 
there is concentration or not. Concentration affects primarily the value 
of the constant, and of course the blocking and the carried load. 

(ii) The extent of possible concentration is limited by the loss and 
the carried load. As might be expected, higher line usage and lower 
blocking imply less concentration. 


3.3 Growth without concentration 


In the prototypical networks without concentration (e.g., those made 
of stages of square switches), the number of crosspoints for N cus- 
tomers must grow like N log N no matter what load is carried. Several 
arguments are given for similar lower bounds, some purely combina- 
torial, others involving traffic concepts and parameters. In particular, 
if blocking is to be kept less than ¢, and total traffic a = AN? is kept 
constant while N increases, the requisite networks must grow like N 
log N if they are made of stages of square switches: especially, for s 
stages 


X = number of crosspoints = N log N + sN + log(1 — €) — a. 


3.4 Growth with full access, allowing concentration 


Some simple and mild combinatorial properties, possessed by all 
practical networks, mandate an N log N order of growth in the number 
X of crosspoints, even when concentration is permitted. A network 
provides “full access” if every inlet can reach every outlet by some 
path. A network is said to be “arranged in stages” (or “made” of 
stages) if its terminals are partitioned into sets Ti, To, --- , Ts41, such 
that 71 consists of the inlets, JT. to T; are sets of internal nodes or 
junctors, and 7,1: are the outlets, with crosspoints placed only between 
T; and Ti+1,1 = 1, --- , s. (See Fig. 1). Here s is the number of stages, 
and every call traverses each T; exactly once, in the specified order or 
its reverse. Finally, a network arranged in stages is called “symmetric” 
if it looks the same from each terminal in any given T;; we content 
ourselves with this informal definition here; a precise one can be given 
in terms of group theory.’ 

We prove this fundamental telephonic fact: A symmetric network 
that provides full access and is arranged in stages must have at least 


enlog N e= 2.71828... 


crosspoints, where N is the number of inlets (outlets, too), and 7 is the 
“neck size,” defined as the size of the smallest 7;: 
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Fig. 1—Network arranged in stages. 


n= min |T\j, |A| = cardinality of A. 


1lsi<s+1 


The ratio n/N is a global measure of concentration or expansion. If, in 
particular, all the TJ; are equinumerous, as happens if the network is 
made of stages of square switches, then the neck size is N, and there 
must be at least eN log N crosspoints, regardless of the traffic char- 
acteristics. 


3.5 Growth without full access, allowing concentration 


The condition of full access is so reasonable that few engineers 
would consider a network that lacks it. Nevertheless, probabilistic 
arguments yield N log N lower bounds for the crosspoint count X even 
when this condition is dropped. The point is that if the network is to 
carry a reasonable load, the “neck size” n cannot be too small; espe- 
cially, it follows from the Erlang formula (1) that n must exceed 
(1 — V1 — p) N for line usage p = m/N. Similar lower bounds involving 
also the required loss can be derived. These lower bounds put a limit 
on how much one can concentrate (measuring global concentration by 
neck size), and they lead to N log N lower bounds for X even when 
there is not full access. For example, any network arranged in stages 
has 


1 
X25 ell — v1 —p) Nlog N + 0(N) 
if each customer’s line carries p erlangs. Thus any sequence of networks 
that grow in the strong sense that p is bounded away from zero must 


grow like N log N, if they are arranged in stages. 
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Fig. 2—Central bus network. 


3.6 Asymptotic results 

Three network structures lead, in the model to be used, to statistical 
equilibrium equations that have the “product form” solution, and so 
all their interesting parameters can be calculated exactly from a 
partition function, and their behavior in the limit N > ~, a = AN? = 
constant, studied. They are (see Figs 2 through 4.) 

(i) The central bus concept—two large N-to-c nonblocking concen- 

trators back to back, with c central buses 


/7 Nk X kc NONBLOCKING CONCENTRATOR 






TOTAL OF N OUTLETS 


N INLETS 


Fig. 3—Frame network. 
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Fig. 4—Remote concentrator network. 


(ii) The frame concept—large concentrators connected pairwise by 
dedicated groups of c trunks 

(iii) The remote concentrator concept—large concentrators, each 
connected to a central distribution core by its own group of c trunks. 
In the “weak” limit as N > ~, a = AN? = constant, the first two 
structures are (not surprisingly) closely connected to Erlang’s formula 
for loss: their probabilities of loss are bounded above by, and approach, 
the value E(c, a). The third is more subtle and elusive, since loss can 
occur in either one of two relevant trunk groups, but also more 
interesting. We can find ‘E(-, -)-type bounds on the loss, but the 
question is whether asymptotically 


loss <= 1 — (1 — 6)? + o(1), 


where 0 is the chance that all c trunks on one (any) concentrator are 
busy, in the limit, remains open. This quadratic upper bound is what 
the loss would be in the limit if the two relevant trunk groups were 
independent, each with “blocking” b. Very few of the many “blocking 
polynomial” approximate formulas used in practice have been vindi- 
cated by so much as an inequality proved in a dynamical model. 

It follows from our analyses that for each e > 0, and each of the 
three kinds of network structure considered, there exist arbitrarily 
large networks of that kind, with loss less than e, and total offered 
traffic a = \N? = constant, whose crosspoint count X grows at most 
linearly with N. We view this as a limit result in the “weak” direction, 
since a constant amount of traffic a = AN? is being divided up among 
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N lines, N — o. The carried loads converge, and so the usage p = m/ 
N, the erlangs carried per line, goes to zero. This linear growth is not 
inconsistent with the N log N orders mentioned earlier; the latter 
follow from combinatorial properties, or are the result of a different 
“limiting direction” in which p is bounded below away from zero. The 
above weak limit result is inappropriate for cases in which, as the 
network grows, each new customer is to supply a fixed number po of 
added erlangs carried; this latter situation enjoins N log N growth. All 
the results about linear growth are given rigorous proofs, for the 
Markov process models adopted, by a passage from finite to infinite 
sources. In particular, no independence or other ad hoc assumptions 
are made to simplify the blocking estimate. 


IV. COMPROMISES AND TRADE-OFFS 


No matter what technology is used to build it, whether Strowger 
switches, crossbars, solid-state crosspoints, or time-division, the design 
of a telephone connecting network is inevitably a compromise between 
the competing criteria of cost and performance. Restricting attention 
to the traffic and operational aspects, it is nearly a truism that an 
overengineered network rich in switches will given unblemished service 
at an unacceptable cost, while one meagerly endowed with switches 
can only provide poor service at bargain cost. The trick the switching 
engineer must perform is to come up with designs that avoid these 
naive extremes. a 

The engineer’s task is also affected by more subtle considerations, 
such as the following: the same pool of switching gear can be organized 
in an efficient way that is combinatorially optimal for connecting many 
pairs of customers in different patterns, i.e., realizing many assignments 
readily. Unfortunately, these efficient ways involve many stages and 
so are usually very complex and difficult to control, because putting 
up a call or removing one requires a lot of information and many 
decisions. Or the same gear can be hooked up in an inefficient, simple 
network of a few stages, which is easy to operate, but since it lacks 
combinatorial power it will have noticeably higher loss. Examples can 
be found in systems in which equipment is dedicated to handle certain 
geographically defined kinds of traffic (see Fig. 3). 

Obviously, then, there are many trade-offs available to the designer 
between complexity, cost, control, and the various performance param- 
eters, such as load and loss. Part of what traffic theory must provide, 
as a proper theoretical underpinning to network design, is an account 
of the options that are available (or unavailable) in the way of equip- 
ment, load, control and structural complexity, growth, and incurred 
loss. Such an account would describe the achievable regions in param- 
eter space, some of the outer limits of possible designs, and the 
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achievable rates of growth of various indices of performance and 
complexity. 

In the past such information for the large networks, which are of 
chief interest, has been sparse and difficult to obtain by means other 
than simulation and admittedly fuzzy theory. Thus, it is pertinent to 
have an account that retains its accuracy and force as the size of the 
network increases without bound. To give a specific example of this 
kind of information, we can put the following questions: For x, p, and 
€ specified positive numbers, are there arbitrarily large networks that 
have blocking probability at most e, line usage at least p, and a number 
of crosspoints per terminal at most x? (To define blocking, let us 
assume for definiteness that call arrivals are random from finite sources 
by pairs, and holding times exponential—a “usual” model, that no one 
can quarrel with.) If there are such networks, how complex are they? 
That is, how fast does their complexity grow as measured by number 
of stages, amount of data needed to select a route, etc.? 


V. THE VALUE OF BLOCKING 


It is widely believed among telephone switching engineers that a 
positive probability of blocking is worth a great deal of switching and 
control equipment. Probably for this good reason alone, the famous 
nonblocking networks first invented by Charles Clos* have never been 
utilized on any but a small scale. Put another way, the canard says 
that to eliminate the last little bit of blocking will take an inordinate 
increase in equipment. But precisely what does the word “inordinate” 
mean here? Can we replace it by a mathematical-function E£(6), which 
increases as the blocking 6 decreases, and whose interpretation is 
somehow that to achieve blocking 6 you need at least E(b) in equip- 
ment? And how is E(6) related to the structure of the network, to its 
size measured by number N of terminals on a side? 

The remark that started the preceding paragraph is really only the 
tip of the iceberg: it is not exaggerating to claim that the mathematical 
basis of traffic engineering is the observation that very likely only a 
moderate number of telephone customers will want to talk to each 
other at the same time. The engineer provides enough lines, junctors, 
switches, trunks, etc., to take care of this overwhelmingly probable 
case, plus maybe a little extra for hedging his bets. The traffic theorist 
provides probabilistic models that make precise the. meanings of 
“average,” “likely,” and “probable” in this setting. Of course the loads 
and losses the engineer seeks or achieves are subject to correction from 
customers’ complaints, public commissions, and supervisors. But no 
matter what the numbers may be, the principle of not having to 
provide for the very unlikely events is universally accepted by all of 
those concerned. This principle amounts to public acceptance of pos- 
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itive blocking probability for public telephony; it stops designs from 
passing the inevitable knees in the cost curves, beyond which costs 
grow very fast for each increment in desired performance. 

Naturally, then, we are interested in quantitative and mathematical 
expressions of the above principle. One can sensibly ask how much 
switching equipment can actually be saved by allowing a blocking 
probability of « > 0. Where in the system can you (and should you) 
save it? In particular, how is allowing a blocking of € related to the rate 
of growth of the switching gear incurring that blocking as the number 
N of terminals gets large? It is known’ that if « = 0 the needed gear in 
crosspoints is bounded above and below by 


const. N log N. 


These results are purely combinatorial, and involve neither probability 
models nor traffic parameters. But is N log N still the right order of 
growth when « > 0, and the problem is posed in the context of our 
“usual” model, with a traffic parameter entering, such as the calling 
rate A per idle inlet-outlet pair? The answer depends on just how » 
varies relative to N as N > o, If it is constant or decreases so slowly 
that the usage p is bounded away from zero, then N log N growth is 
necessary; if \N? remains constant, then for any e > 0, X need grow 
only linearly with N. 


VI. THE KINDS AND ORIGINS OF LOSS 


When we think about the probability of blocking in a network, it is 
useful to ask where and how it originates. At high levels of offered 
load, most of the loss incurred might be due primarily to frequent 
outright overloads of critical “bottlenecks,” even without the occur- 
rence of any combinatorial niceties such as mismatch. At lower levels 
of offered load, the fraction of loss owing to overloads is very small 
because the high occupancy states have very small probability, and 
most of the loss is due to mismatch in states of moderate occupancy 
and high total probability. It is not always possible to draw this 
distinction exactly in practice, nor is it necessary. It is important for 
theoretical purposes, though, because it suggests exactly analyzable 
large network structures (and models for them), in which the blocking 
is either all overload, or overload with certain simple kinds of mis- 
match. Such models can be used to study the trade-offs among the 
traffic and growth parameters; several are described in the latter half 
of this work. 


Vil. THE VIRTUE OF CONCENTRATION 


Let it be agreed that for many terminals N and small calling rate 
per idle terminal-pair, the chance that many customers will want to 
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talk simultaneously is very small; if the principle of not providing for 
the very unlikely event is to be taken seriously, how should switching 
equipment be arranged? A standard and traditional method is to 
concentrate traffic. 

The most efficient networks known today concentrate traffic from 
very many lightly loaded terminals into a heavily loaded central 
distribution core, in which link occupancy may run as high as 0.8 to 
0.9, and then expand it back out in an inverse manner. The number of 
terminals entering the core is typically much smaller than the number 
of inlets, a feature that leads to a natural bottleneck or what we later 
call a “neck size.” The blocking incurred can be thought of as being of 
two kinds, or arising from two sources: concentrator blocking, the 
inability of free inlets or outlets to get a line to the distribution core, 
and internal blocking in the core itself. Each of these sources of loss 
may in turn be due mostly to overload or to mismatch. 

Thus, to reap the economic and operational values consequent on 
allowing blocking, the possible or maximum numbers of calls in prog- 
ress at various places in the network are intentionally limited so as to 
save switching gear, the argument being as before that while having 
more calls in progress at these places is possible, it is so (or sufficiently) 
unlikely that there is no point in providing for it. We ask, how 
effectively can such limits curb the growth without impairing service? 

Now the known nonblocking networks achieve their perfect opera- 
tion by systematic expansion, the provision of more paths than can 
actually be used at one time; this is the antithesis of the concentrator- 
cum-distribution core idea usually used in practice. These nonblocking 
networks exhibit N log N growth, as they must. Since concentration is 
the opposite of expansion, it should lead to a saving in crosspoints. 
How big a saving is it? Especially, when can concentration reduce the 
order of growth to something slower than N log N? 


Vill. ASYMPTOTICS FOR LARGE NETWORKS 


In seeking to answer some of the preceding questions, we shall 
examine the behavior of network parameters as the number N of 
terminals on a side becomes arbitrarily large. To fix ideas we begin 
with some thought-experiments that will lead to more specific ques- 
tions. 

Accepting for a moment the conventional wisdom that all efficient 
large networks use concentrating switches, we imagine a sequence of 
such networks with more and more terminals, and ask: What can 
happen to the efficiency of these networks as they grow? Is it possible 
to keep the loss below a specified amount without having the crosspoint 
count or the number of stages grow very fast? Are there large networks 
which, though they may not be in the running as red-hot field designs 
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for a particular technology, nevertheless are of interest because their 
load, loss, and cost can be easily and accurately calculated or esti- 
mated? If there are such networks, what combinatorial features ac- 
count for the ease of calculation? Where are “most” of the crosspoints, 
in the concentration stages, or in the distribution core? 


IX. LIMIT DIRECTIONS 


Needless to say, some care must be taken in carrying out the 
analyses needed to answer these questions. For most purposes, it is 
enough to make more exact the way in which the traffic and perform- 
ance parameters are to vary as N grows; usually some group or function 
of them is constrained to stay in a given set. Such constraints define 
“directions” in which limits or other asymptotics are being sought, and 
they provide useful ways of looking at the performance of very large 
networks. Two such directions, leading to very different growth rates, 
will be of interest here. 


9.1 The “‘weak”’ limit 


For example; we can let the offered load per idle pair A get small and 
N get big, so that a = AN’ is a fixed constant. This amounts to letting 
the process of attempted calls become Poisson with rate a; the corre- 
sponding limit process is sometimes called a “passage from finite to 
infinite sources,” and in traffic theory is often associated with familiar 
notions such as the Poisson approximation to the binomial distribution. 
We shall show by examples that some interesting limits of this kind 
exist and can be evaluated, leading to functions and concepts well- 
known in traffic theory, such as the Erlang loss E(c, a). Such calcula- 
tions lead to information about the growth rate of cost and complexity 
for large networks that have specified load and loss. 


9.2 The strong constraint 


A very different condition, to be used later, is that the usage p = m/ 
N of our sequence of growing networks be bounded away from zero. It 
says roughly that each new customer, as N grows, adds a fixed amount 
of carried load to the network, at least. This condition, incompatible 
with the “weak” limit, leads to N log N growth in crosspoints, and is 
not, as far as we know, associated with any limits in distribution, the 
way the weak limit is. That is why we call it a condition and not a 
limit. It is physically natural for networks to grow in this manner, and 
so this is an important condition to consider. 


X. NETWORK STATES 


We shall use a model for the structural and combinatorial aspects of 
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a connecting network. This model arises by considering the network 
structure to be given by a graph G whose vertices are the terminals of 
the network, and whose edges represent crosspoints between terminals 
by pairs, with some of the terminals designated as inlets or outlets. 
Calls in the network are described by paths on G from an inlet to an 
outlet. Thus, a connecting network p is a quadruple v = (G, J, Q, S), 
where G is a graph depicting networking structure, J is the set of 
vertices of G which are inlets, {2 is the set of outlets, and Sis the set of 
permitted or physically meaningful states. It is possible that J = Q 
(one-sided network), that IM Q = @ (two-sided network), or that some 
intermediate condition obtain, depending on the “community of inter- 
est” aspects of the network v. Variables w, x; y, and z at the end of the 
alphabet denote states, while uw and v denote a typical inlet and a 
typical outlet, respectively. 

A possible state x can be thought of as a set of disjoint chains on G, 
each joining J to 2. Not every such set of chains need represent a state 
in S: wastefully circuitous chains may be excluded from S. The set S 
is partially ordered by inclusion =, where x <= y means that state x can 
be obtained from state y by removing zero or more calls. It is reasonable 
that if y is a state and x results from y by removal of some chains, then 
x should be a state too, i.e., S should be closed under “hangups.” It can 
be seen from this requirement that the set S of permitted states has 
the structure of a semilattice, that is, a partially ordered system whose 
order relation is definable in terms of a binary operation M that is 
idempotent, commutative, and associative, by the formula x < y iff 
x = x1 y. Here for x N y we can simply use literal set intersections: 
x1 y is exactly the state consisting of those calls and their respective 
routes that are common to x and y. 

An assignment is a specification of what inlets are to be connected 
to what outlets. The set A of assignments can be represented as the 
set of all fixed-point-free correspondences from subsets of J to (2. The 
assignments form a semilattice in the same way that the states do, and 
A is related to S as follows: call two states x, y in S equivalent as to 
assignment, written x ~ y, iff all and only those inlets u € J are 
connected in x to outlets v € Q, which are connected to the same v in 
y, though possibly by different routes. The realizable assignments can 
then be identified with the equivalence classes of states under ~, and 
there is a natural map y:S — A, the projection that carries each state 
x into the assignment y(x) it realizes, ie., the equivalence class it 
belongs to under ~. 

With x and y states such that x = y, it is convenient to use x — y to 
mean the state resulting from x by removing from <x all the calls in y. 
Similarly, with a and 6 assignments such that a = b, we use a — 6 to 
mean the assignment resulting from a by dropping all the connections 
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intended in b. Note that here x — y, a — b have their usual set-theoretic 
meaning. 

It can now be seen that the map y is a semilattice homomorphism 
of S into A, with the properties: 


XxEzy => y(x) = y(y) 
xBy=> y(x — y) = y(x) — y(y) 
y(x Ny) S y(x) N y(y) 
y(x) = ¢ = > x = 0 = zero state, with no calls up. 


Not every assignment need be realizable by some state of S. Indeed, 
it is common for practical networks to realize only a vanishing fraction 
of the possible assignments, and the networks that do realize every 
assignment, the so-called rearrangeable networks, have been the 
objects of substantial theoretical study. Thus, the image set y(S) of 
realizable assignments is typically much smaller than the set A in 
which it is embedded. A unit assignment is, naturally, one that assigns 
exactly one outlet to some inlet, and it corresponds to having just one 
call in progress. It is convenient to identify calls c and unit assignments, 
and to write y(x) U c for the larger assignment consisting of y(x) and 
the call c together, with the understanding, of course, that c is “new in 
x” in the sense that neither of its terminals is busy in x. 

We denote by A, the set of states that are immediately above x in 
the partial ordering = of S, and by B, the set of those that are 
immediately below. Thus, 


a, = {states reachable from x by adding a call} 
= {states reachable from x by hangup}. 


For c new in x, let Ace = Ax N y'[y(x) U c]; Acx is the subset of states 
of A, that could result from x by putting up the call c, because y~*y() 
is precisely the equivalence class of y under ~. If A.x is empty then we 
say c is blocked in x: there is no y € A, that realizes the larger 
assignment y(x) U c. It can be seen that with F, the set of new calls of 
x that are not blocked, the family {A.., c € F,} forms the partition of 
A, induced by equivalence ~. 


Xl. ROUTING OF CALLS 


We shall use a routing matrix R = (r,,) as a convenient formal 
description of how routes are chosen for calls. The class of routing 
matrices, R, can be described thus: for each x € S let II, be the 
partition of A, induced by the relation ~ of “having the same calls 
up”, or satisfying the same assignment of inlets to outlets; it can be 
seen that II, consists of exactly the sets A. for c free and not blocked 
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in x; for Y € II, rx, for y € Y is to be a probability distribution over 
_ Y, that is r,, = 0 and \'y<y xy = 1; Fay is to be 0 in all other cases. 

The interpretation of the routing matrix as a method of choice is to 
be this: any Y € II, represents all the ways in which a particular call 
c (free and not blocked in x) could be completed when the network is 
in state x; for y € Y, rx, is the chance (or fraction of times) that if call 
c arises in state x it will be completed by being routed in the network 
so as to take the system to state y. The distribution {r.,, y € Y} 
indicates how the calling rate owing to c is to be spread over the 
possible ways of putting up this call. Evidently, such a description of 
routing could be made time-dependent, and extended to cover refusal ~ 
of unblocked calls as an option; we do not consider these possibilities 
here. The problem of choosing an optimal routing matrix R has been 
worked on at some length. 


Xil. STOCHASTIC MODEL 


We now recall’ a stochastic model for the traffic offered to a network. 
A Markov stochastic process x; taking values on S can be based on 
these simple probabilistic and operational assumptions: 

(t) Holding times of calls are mutually independent variates, each 
with the negative exponential distribution of unit mean. 

(<i) If u is an inlet idle in state x € S, and uv ¥ u is any outlet, there 
is a conditional probability Ah + o(h), \ > 0, as h — 0, that u attempts 
a call to v in (¢, ¢ + A) if x, = x. 

(iii) A routing matrix R = (r,,) is used to choose routes, as follows: 
If c = {(u, v)} is a call free and not blocked in x, then the fraction of 
times that the system passes from x to y € A-x if c arises when x; = x 
is just Pxy. 

(iv) Blocked calls and calls to busy terminals are declined, with no 
change of state. 

It is convenient to collect these assumptions into a transition rate 
matrix @ = (qx), the generator of x; this matrix is given by 


1 ifyeB, 
_ ) Ary ify € Ax 
dy = ) — |x| — As(x) if y = x, with s(x) = |F, 
0 otherwise, 


and the associated statistical equilibrium (or state) equations take the 
simple form ; 


[|x| + As(x)]px = Y) Dy tA Yo Dyce x ES, 
YEA yEBx 


where {p:, x € S} is the asymptotic distribution of x;. Here |x| denotes 
the number of calls in progress in x, and s(x) is the number of 
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unblocked idle inlet-outlet pairs in x, the possible “successes” in x; 
note that s(x) = |F,. 


Xill. PARAMETERS OF INTEREST FOR DESIGN AND ENGINEERING 


We shall frequently use about a dozen basic parameters, character- 
istic of the operating network, and important for these reasons: They 
describe load, cost, and performance, or they can be measured readily, 
or they arise naturally in the associated traffic theory and are conven- 
ient for calculations and asymptotic analyses. For two-sided networks 
py, these parameters are the following: 


A = calling rate per idle inlet-outlet pair 
N = number of terminals (inlets, or outlets) on each side 
bl = probability of blocking 
m = carried load = expected number of calls in progress 
p = usage = m/N = erlangs carried per terminal 

o = standard deviation of number of calls in progress 

X = total number of crosspoints 

s = number of stages (if v is arranged in stages) 

a = \N’ = total offered load when everyone is idle 


w = max |x| = maximum possible number of calls in progress 
xeS 


= “neck size,” defined for vy arranged in stages separating junc- 
tor groups 7), To, --- Ts41, as the cardinality of the smallest 
T;. 


Remark: The parameter a = AN’ is a convenient abbreviation for total 
offered load, especially for certain weak “large network” asymptotics 
for which AN? is held constant as \ > 0 and N > o. 

Remark: The ratios w/N and n/N are rough global measures of 
concentration, global because there could be, for example, remote local 
concentrators with a concentration ratio different from each of these. 
Clearly, w <n, when both w and n are defined. 

Notation: We write X(v), p(v), etc, whenever it is necessary to express 
the dependence of a parameter on the network ». 


XIV. THE PARAMETER SURFACE 


In the early’ applications of traffic theory to trunking problems, a 
central role was played by Erlang’s loss formula, which depended on 
two parameters, the load a, and the number c of trunks. For connecting 
network studies, though, to take into account at least the size of the 
network and the “finite source” effect, if not other network features, 
a modification of Erlang’s formula is more suitable. (The finite source 
effect is a recognition that busy terminals generate no traffic.) Such a 
formula has been derived in earlier work.? We shall exhibit many 
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useful results that follow from it, or from it together with reasonable 
but special hypotheses, such as the property of a network that it is 
made of square switches, or is arranged in stages, or provides full 
access. , 

For a two-sided network with N terminals on a side, the load, loss, 
load deviation, and rate parameter, A, are related by the following 
formula: 


Generalization of Erlang’s formula to networks: 


1 m 
PO NS me (1) 
a =si< ihe 


(1 — p)? + (o/N)?. 


Proof: 1 — bl is the fraction of attempted calls that are not blocked. © 

By the law of large numbers, this fraction is the rate of successful 

attempts divided by the total rate of attempts, both in equilibrium. 

For a state x let s(x) be the number of inlet-outlet pairs (u, v) that are 

not blocked in x; “s(x)” stands for “successes in x.” The success rate 

is then A )) pxs(x) and the total attempt rate A = px(N — |x|)”. 
xeES xe 


However, in equilibrium, the rate in equals the rate out, so 


AY px8(x) = Y pzl|x| = m. 
xeS xES 


Evidently the total attempt rate can be written as A[(N — m)? + 07], 
and the general Erlang formula is proved. 

Remarks: The gist of the Erlang formula is that six of the parameters 
of interest cannot assume arbitrary values but must lie on a surface 
described by a simple rational function. It is apparent that similar but 
more complex formulas can be proved for one-sided networks, or two- 
sided ones with different numbers of terminals on each side; we shall 
not consider these, because the basic ideas are the same. Note that 
blocking, a complicated quantity in the model, is determined solely by 
the first two moments of the number of calls in progress. Also, if N 
and A vary so that (1 — p)? + (c/N)* approaches 1, the formula 
approaches the exact form m = a(1 —5bl) it has in the “true” Erlang 


case. 


XV. SOME TECHNICAL RESULTS 


The generalized Erlang formula has many useful consequences. 
Some of them are summarized in the next few results, all of which are 
additional relationships among the engineering parameters. 
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2 
Lemma: x <p-—p* (2) 


Proof: Clearly, E|x:|? = ¥ px|x|? = Nm. Hence, 
xeS 
o? = E|x,|? — (E|x|)? = Nm — m’. 


m 
:———— = 1-p. 
Lemma a — BI Pp (3) 
Proof: Lemma (2) implies that 
a 
p’—2p+1l+a5<1-p. 


By (1) the left-hand side is m/a (1 — bl). 
Representation of N: 


m 
+ Pcs) vias 2 + 2 
m m a= wa (m 0°) 
N= = 
fe 
a(1 — dl) 
m1 + v1— 
ashe i 
a(1 — dl) 
where 
sli2 = ie 
cal a(1 — bl) m 
and 
O0<p<l. 
Proof: N is one of the quadratic roots 
m(1 + v1 —p) (5) 
> . 
| eee cane 
a(1 — bl) 


To show that the plus branch is the right one we note that the 
quadratic (in y) function 


m 
y7—-__™ | _ omy + m2 + 6 6 
al ery my +m (6) 
equals m? + o” at y = 0, and has a negative minimum at 


TELEPHONE NETWORKS 517 


m 
m 

PS 

a(1 — bl) 


This value is a minimum because the second derivative is 


m 
2 -as | > 2p 


by Lemma (3). This value is negative because it is 


2 
m? + o -—__ (7) 


; ee ees 
a(1 — bl) 
and the Erlang formula (1) implies that 


a(1 — Dl) 
Substituting this into (7) we see that the value is 


_ yal mM 
a(1 — bl) 


clearly negative. Thus y* separates the two real roots of (6). However, 

Lemma (3) implies that N > y*, so N must be given by the plus sign 

in (5). It is obvious that p is positive; to show p < 1 we divide (8) by m? 
m 


N m ‘ 
p-1-fi-2)1-7 "| <1, 


which incidentally strengthens Lemma (3) to 


+m? = am - No] 1-7 (8) 


m 
Oia eo 
P ail—bl) 


Lemma: If v is such that, at most, w calls can be in progress, then for 
w<N 


s m 
(i) bl=1 ~ X(N — w)? 
(ii) w= NO —V1 —p) 
(iit) p< = = {at (9) 


Proof: In this situation the average calling rate is greater than or equal 
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to (N — w)*A, so (i) follows from the generalized Erlang formula (1). 
Thus also 


ge a Be i ae 
AN7[(1 — p)? + 0°N]” ACN — Tha v) 


2 een Y 2n7-2] — N2 hi 
< N%{(1 —p)? + oN 1= Nay 


But by Lemma (3), 


go ee 
ai—-bhb =» 


and thus (ii) and (zt) via 
N-ws=Nv1-p. 
The same argument proves 


w= N[1 — V(1 — p)? + o°N~]. 


XVI. NONCONCENTRATING NETWORKS GROW LIKE N LOG N EVEN 
IN THE WEAK LIMIT 


One way to display the value of concentrating traffic is to show how 
bad things are when you do not do it. We shall look at a large class of 
practical networks that have no concentration, and show that to all 
intents and purposes, these networks grow like N log N for N terminals. 
In particular, without concentration, these networks have no way of 
trading off nonzero or even substantial blocking (up to some « > 0) for 
slow growth, i.e., slower than N log N, or constant x N log N with a 
constant that depends on usage p. Especially, these networks have 
N log N growth even in the weak limit a = AN” = constant; we show 
later that, in this special case, linear growth is achievable if concentra- 
tion is used, even with blocking kept less than e. 

Now the prototypical network without concentration is one that is 
made of square switches arranged in stages, and these constitute the 
class we consider. Since engineers are primarily interested in networks 
’ with a high degree of symmetry, which “look the same” from any 
terminal within a junctor group, or from any inlet, or outlet, we shall 
restrict attention to networks pv with these properties: 

(t) vy has s= 1 stages 

(tt) The switches in each stage are square, and identical. 

Note that the number of stages is not fixed, and that different stages 
may have different numbers of switches; also, the interconnection 
patterns between the stages (in old terminology, the cross-connect 
fields) can represent arbitrary N-permutations. 
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Theorem: Let m, nz, +++ , ns be s divisors of N, possibly with repeti- 
tions, and consider a network v with N/n; n; X ni switches in its ith 
stage, and blocking bl < «. Then its crosspoint count X(v) satisfies 


X(vy) ZN ¥ ni = Nlog N + N(s + log(1 — €)) — a. (10) 
i=l 


Remark: The number of outlets reachable from an inlet is at most 
II. n;. If every inlet can reach every outlet, then this product exceeds 
N — 1, and in this “usual” case 


Xm =st log il n=s + log Nn (11) 
i=l i= 
and the conclusion of the theorem follows. In the opposite “unusual” 
case, ITj-1 ni < N, some calls are permanently blocked, and the network 
does not provide full access. Thus, one point of the theorem is that 
even in this combinatorially poor situation, blocking cannot save more 
than a linearly growing number of crosspoints when a = AN? is 
constant or grows at most linearly, so that X(v) = O(N log N). 
Proof: Only the “unusual” case N > Ij.1 n; need be considered. Let 
s(x) be the number of possible “‘successes”’ in x, i.e., of inlet-outlet pairs 
idle and not blocked in state x, so that 


(N — |x|)? — s(x) 


is the number of idle inlet-outlet pairs that cannot be connected in x. 
In the unusual case an idle inlet cannot reach at least N — IT?-; ni 
outlets. In state x, at most |x| of these unreachable outlets are busy; 
thus there are at least . 


N — II n; - |x| 
i=l 
idle outlets with no path to our test inlet. This being true for each idle 


inlet, and there being N — |x| idle inlets, the number £, of blocked idle 
inlet-outlet pairs in state x satisfies 


Bb. = (N - |x|) (w- I ni = 1) 
or since (N — |x|)? = s(x) + Bx, 
s(x) = (N — |x]) IT ni. 
Averaging this inequality with respect to the equilibrium state proba- 
bilities {pz, x € S}, and noting that m = Yxes |x| px =A Vaes S(x) pz, 
we find 
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w<(N—m) In: 
A i=1 


Nm 
A Gao 


Therefore, 


XW) =NY ni 


i=] 
=N (s+ log 1 n) 
i=1 
> Ns + Nlog N + N log——— 
me i al — p) 
To find a suitable lower bound on the last term we use the basic 
generalization (1) of Erlang’s formula: 


oe hs - th 
~ MN—m)? +A? al(1 — p)? + oN] 
m 
= ——. 
a(1— p)’ 


Since the blocking DI is less than e, one finds 


log(1 — e) s log(1 — bl) = log zs — log(1 — p) 


m 
a(1 — p) 
X(v) = Nlog N+ sN + N log(1 — p) + N log(1 — €). 
Next we argue that, as in Lemma (2), 
oN? =p-p 
(l-—p)?>+o’N’=1-p, 
and so by the Erlang formula (1) again, 


_1-4Ol Te ee ca 7) a 
p= N a[(1— p)°+o0°N“*]s N G=Nha 











whence 
log(1 — p) > log N — log(N + a) = -= 


and the proof is complete. 


XVII. GROWTH OF NETWORKS ARRANGED IN STAGES 


Almost all the connecting networks used in practice are made of 
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stages, or arranged in stages. This property means that the terminals 
of the network are partitioned into disjoint sets T,, To, ---, Ts+i, 
which are simply ordered as indicated; the first set T, consists of the 
inlets, the next s — 2 sets are internal junctors, and the last set 7341 
consists of the outlets; crosspoints are placed only from terminals in a 
given set 7; to terminals in the next set 7,4; in the ordering (see Fig. 
1). Thus every call in progress is a path from an inlet to an outlet that 
passes through each set 7; once in the order specified. The crosspoint 
pattern between successive sets is called a stage of switching, and is — 
representable as a bipartite graph. The sets 7; need not all have the 
same numbers of terminals; if they do not, there is expansion or 
concentration. The size of the smallest T; is called the “neck size,” and 
is of course an upper bound on the number of calls in progress: 


|x| =n= min |T\]. 
1si<s+1 

We shall restrict attention to symmetric networks, in which the 
network looks the same to every terminal in a given 7;, 1 = 1, ---, 
s + 1. A network provides full access if every inlet-outlet pair can be 
connected by a path through the network, with no doubling back 
allowed. Full access is a natural convenient condition that greatly 
simplifies arguments, but it is not necessary for proving N log N 
growth. 
Theorem: Let v be a symmetric network with N inlets (& outlets), with 
neck size n, providing full access through s stages. Then the 
crosspoint count X (v) satisfies 


X(v)=enlog N e= 2.71828 ---. (12) 


Proof: Let ni, t = 1, «++, s, be the number of crosspoints in stage i 
connected to a junctor between stages i and 1 + 1. By symmetry this 
number is the same for all such junctors or terminals. If stage i is 
represented by a bipartite graph, n; is the degree of each “input” 
vertex. Thus, if the neck size is n, then stage i has at least nn; 
crosspoints, and so 


AXA(vp) =n 5 nj. 
Since pv provides full access it must be true that 
Ns 0 Ni3 
for an inlet can reach no more than II3_; n; outlets; so if N exceeded 
this product there would be some it could not reach. Hence, by the 


inequality linking arithmetic and geometric means, 
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eee 


i=1 


s 1 
>ns (n n) 
i=] 


>nsN°, 


1 
But sN*, viewed as a function of a real variable s, has a unique 
minimum at s = log N, so that 
1 


X(v) =nlog NN" = enlog N, e = 2.71828 ---. 


Remark: If the neck size is N, as it is for networks made of square 
switches, then for symmetric » providing full access 


XA(v) =e N log N. 


Lemma: If v has neck size n < N, then 


1/2 
REN a |e deni a=), (13) 
a(1 — bl) 

This result links the neck size n to the performance parameters m 
and bi and to the traffic parameter a. The second inequality is 
remarkable in involving only the line usage p; the higher p is to be, 
the closer the neck size must be to N. 

Proof: If v has neck size n, then at most n calls can be in progress at 
a time, so that N — |x| = N — n, and by the Erlang formula (1) 

ay eae = 
ADp.(N — |x|)? A(N — n)? 


_ 4 \2 2 m 
(N-—n)*sN wooly: 


The second inequality follows from Lemma (8); it leads at once to this 
basic result: 


Theorem: If v is a symmetric network arranged in stages, providing 
full access and with each inlet carrying p erlangs, then 


X(v) = e(1— V1—p)N log N. (14) 


Proof: (12) and (18). 


Discussion: This inequality says that any symmetric network providing 
full access has const. N log N crosspoints, where the constant depends 
only on the line usage, no matter what blocking is incurred. Using the 
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first inequality in (13) gives a larger constant, now dependent on loads 
(offered and carried) and blocking. 


Theorem: Let v be a network arranged in stages, not necessarily 
providing full access. Then 


X(v) = e(1 — V1 — p)PAN log N — YN logaA 
+ N log(1 — bl) + Nlogp]. (15) 
Proof: As in the proof of Theorem (12) we find 


1 


Xv) = ns( I ni) (16) 


Next, the averaging argument of Theorem (14) gives 


: Nm _ pp 
mat =p) NO =P) 
1 1 


For b > 0, sb* assumes a unique minimum at s = log b, and (b)e> = 
e = 2.71828, --- , so by (16), 


X(v) = ne logs > 


> e(1 — V1 — p)N[log p — logA — log(1 — p)]. (17) 

The generalized Erlang formula (1) can be put in the forms 
AN(1 — 61)[(1 — p)? + P?N~?] =p (18) 
AN(1 — b1)(1 — p)? + (1— p) —1+AN(1 — bI)0P’N7=0. (19) 


The second form is a quadratic equation for 1 — p whose solution, 
picking the plus branch, is 


V1 + 4\N(1 — b1)[1 — A(1 — B1)eP?N] -— 1 
2AN(1 — bl) 


By (18) above, the quantity (factor) 1 — A(1 — b1)o’*N7' under the 
square root is equal to 


1l-p= 


pe 
P(N-m) +o 


and lies strictly between 0 and 1. Therefore, 


y v1 + 4AN(1 — bl) — 1 
2AN(1 — bl) ; 


Let y = 2AN(1 — 62) for short, so that 


1-— 
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Hence, 


Vv1+2y-1 
log(1 — p) < log == 1 a 


og —————_ 
V1+2y+1 
—log(1 — p) > log(v1 + 2y + 1) — log 2 


1 
> 5 log(1 + 2y) — log 2 


>s (log 4 + log A + log N + log(1 — b1)) — log 2. 


Returning now to formula (17) we find 
X(v) = e(1 — V1 — p)N[log p — 4 logA + % log N + % log(1 — d/)]. 


Remark: Theorem (15) shows that full access is not necessary for 
N log N growth; it just makes the constant bigger and the argument 
simpler. 


Theorem: Let vy be a sequence of networks on N terminals arranged 
in stages, and such that 
(t) p(vn) = po > 0 
(i) bl(vy) Se (20) 
(iti) A(vy) is bounded. 
Then as N— 


X(vy) = 5 (1 — V1 — po)N log N + O(N). 


Proof: Immediate from (15). 


Remarks: Theorems (15) and (20) of course apply also to the networks 
made of stages of square switches considered in Theorems (10) and 
(11). However, it should be noted that the possible traffic asymptotics 
in the two theorems are different, although they might overlap. In (11) 
a = AN?’ grows at most linearly, while in (20) it grows at least linearly; 
in (11) a = AN? could be identically a constant (the weak limit case), 
so that A and p both go to zero, and X(v) grows like N log N instead of 
linearly as it might [see Theorem (24) ]; in (20), on the other hand, p is 
bounded away from zero, hence AN is also, so a = AN? increases at 
least linearly, and X(v) grows like constant X N log N, with constant 
depending on the lower bound for p. The point is that the absence of 
concentration exemplified in the square switches compels N log N 
growth even in the weak limit (a = AN? constant, p vanishing), while 
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in general if concentration is allowed it takes the “strong condition” 
P = Po > 0 to force N log N growth. 


XVill. ANALYZABLE LARGE NETWORKS 


We turn now to the study of three simple patterns or structures for 
networks with concentration. Interest in them arises from the fact that 
their loads, losses, and complexity can be calculated or rigorously 
bounded for arbitrarily large values of N. Most of them embody 
features, such as frames and concentrators, which are familiar in 
telephone network design, and some provide tenuous links to previous 
approximate blocking formulas based on independence assumptions. 
These formulas suggest inequalities stating that certain natural “block- 
ing polynomials” are in fact upper bounds on the probability of 
blocking; their proof or disproof has eluded us so far, but they are 
worth mentioning nevertheless. 


XIX. CENTRAL BUSES CONCEPT 


A useful extreme case is a network that, like the trunk group, has no 
blocking states until a certain number c of calls in progress is reached, 
at which point all calls are blocked. One way to build such a network 
is to concentrate both the N inlets and the N outlets down to c 
terminals in a nonblocking way, and then to put a c-by-c nonblocking 
network in between, as shown in Fig. 5. A better way results when we 
note that the central network is superfluous; all you need are c central 
buses with “expanding” networks on each side such that any idle 
terminal can reach an idle bus. An arrangement of this kind is shown 
in Fig. 2, in which each bus has an appearance on every (inlet or 
outlet) subnetwork; when these are nonblocking, a theoretically useful 
solvable case results. We call such networks central bus networks. 


Remark: Clearly, the central bus idea for networks springs right out of 
the idea that in a large network with lightly loaded lines, only a 


oat X ¢ NONBLOCKING CONCENTRATOR 





- ¢ JUNCTORS, 
‘ \ 


INLETS N OUTLETS 


Ui 
¢ X ¢ NONBLOCKING CORE 


c X N NONBLOCKING CONCENTRATOR 


Fig. 5—Network nonblocking up to c calls. 
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moderate number of customers will be talking. The thought is this: if 
the low and moderate occupancy states have all the probability, let’s 
use all our switching gear to make them as nonblocking as possible, 
and just ban the unlikely high occupancy states altogether. Accord- 
ingly, the designer guesses or calculates what that moderate occupancy 
might be, on the average, and provides some larger number c of central 
buses, with nonblocking access for everyone. 

Remark: The central bus network is also a good candidate for the best 
disposition of a fixed number X of crosspoints at very low traffic A. For 
it is known’ that 


blocking = const. A” + o(A”) as AIO, 
where 
m= min {|x|: some call is blocked in x}. 
xe 
Thus, of all networks made out of X crosspoints, the ones for which m 
is largest will have asymptotically least blocking at low traffic, as 
A— 0. 
For traffic purposes, the number of calls in progress is an adequate 


notion of state for central bus networks. Under our assumptions, the 
equilibrium probabilities p, of n calls up satisfy 


Pn = pow; (N= + DUN ~ n+ 2) -++ N? 
and so the blocking is 
< (N-o)'(N-e+ 1)? ... N? 
ce! 
bl = ; 


N?+ EF VA" -J4 1)? +. N? 
ja J! 


We introduce the parameter a = AN? by writing this as 


= a) (5) (3) 





nm Se aT C77 a aT ET (21) 
PEN ae eee Ae Pies 
“ERA O-a) (A 


If we extend each product in the denominator all the way up to c, and 
replace the 1 by the product in the numerator, we increase the formula 
to exactly the Erlang loss function E(c, a). Thus, 


bl < E(c, a). (22) 


A similar argument shows that the traffic carried, m, satisfies 
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m <a{1— E(c, a)]. (23) 


Thus, for central bus networks the load and loss are bounded above 
by the corresponding Erlang load and loss for c trunks and incoming 
traffic a. 


XX. LINEAR GROWTH IS POSSIBLE IN THE WEAK LIMIT 


The central bus concept also leads to an asymptotic estimate of 
what is possible in the way of network growth. We prove 
Theorem: For every « > 0, there is an integer c, and a sequence vy of 
networks on N terminals with c central buses such that as N > © 
with a = constant 


(t) bl(vn) < E(c, a) < €, bl(vy) > E(c, a) 

(ti) m(vy) < afl — E(c, a)], m(vy) > afl — E(e, a)] 
(iii) X(vn) < 136N logec + O(c) 

(tv) s(vn) S 4(1 + logec). 


(24) 


Remarks: This result says that there exist arbitrarily large networks 
with specified blocking whose growth in crosspoints is linear (with 
slope dependent on blocking), whose complexity in number of stages 
is logarithmic, and whose load approaches a constant. Here the growth 
in “size” N is accompanied by a diminution in the offered load A per 
idle pair, according to AN? = a, a constant, in a natural passage from 
finite to infinite sources. Because of the way these networks will be 
defined, they will be at the “combinatorially efficient, hard to control” 
end of the trade-off spectrum. 

Proof of Theorem (24): Given e > 0, choose c to be the smallest integer 
such that E(c, a) < €, and construct a sequence of networks vy with N 
terminals on a side and c central buses, with nonblocking access to an 
idle bus from each side. Property (z) follows from (21), (22); (ii) is a 
result of (23) and the general Erlang formula (1); we have 


N _ m(1+ v1—~p) ae?) (4) 
| ee ee 
a(1 — bl) 


where m is the load, a = AN”, bl = blocking, and p € (0, 1). Since 
m = c we must have, by (4) 
n 
a(1 — bl) 
But by (i), bl > E(c, a), whence the limit in (zz). 
To complete the proof we use the basic bounds on the complexity of 


nonblocking networks given’ by Bassalygo and Pinsker, according to 
whom each of the c X c nonblocking networks needed in Fig. 2 can be 


—>1 as Nf, 
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made using at most 68c logec + O(c) crosspoints and 2(1 + logsc) 
stages. 


XXI. A DIRECT ARGUMENT 


It is a property of the “direction” picked for the asymptotics in 
Theorem (24) that the carried loads m approach a limit, so that the 
line usage p = m/N goes to zero. In practical terms this means that a 
fixed amount of traffic is being spread over more and more customers, 
while the load contribution from any one vanishes. Ultimately, then, 
this limit “direction” is suitable only for very many lightly loaded lines, 
and it would be more interesting to have similar or analogous results 
in which the carried load would increase as the networks grew, and 
the usage p would be bounded away from zero, with the blocking 
always less than some prescribed number « > 0. 

We know from Theorems (15) and (20) that even without the 
constraint “bl < «,” such a positivity constraint on p necessitates 
N log N growth for practical networks. It is instructive, however, to 
give a separate direct argument for this behavior in the case of the 
networks constructed in Theorem (24). 


Lemma: For 1 > e€ > 0, let c be the earliest integer such that 
E(c, a) se. Then 


c=a(1—e). (25) 


Proof: By hypothesis, E(c, a) <= « < E(c — 1, a). As is well-known, the 
Erlang function satisfies the recurrence 


E(c, a) = : 
eae See 
akK(c — 1, a) 
Thus, 
1 
- <e< E(ce—-1, a). 


1 +——_—_—_—_ 
ak(c — 1, a) 


Replacing E(c — 1, a) on the left by the smaller e will decrease the left, 
giving 





Se. 


c 
t+ — 
ae 


Proposition: Let » be a central bus network constructed to have 61 <= 
e, as in Theorem (24), by the method® by Bassalygo and Pinsker. Then 


X(v) = 8DN logeN + 4Np{logs(1 — €) + logeA]. (26) 
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Proof: v achieves usage p = m/N, so there must be a state x € S with 
|x| = Np. The method of construction implies that v is a network 
having 4 + 4 logec stages, where c is the number of central buses, 
chosen to be the earliest integer c such that E(c, a) <= e. Thus every 
call passes through 4 + 4 logec crosspoints. Since two calls do not pass 
through the same crosspoints, the state x has at least 4Np logec busy 
crosspoints, and so 


X(v) = 4Np logoc. 
But by Lemma (25) and the choice of c, we have 
c= a(1—e) =AN7(1 — ©) 
logec = 2 logoN + loge(1 — €) + logeA (27) 
X(v) = 8pN logeN + 4Np[loge(1 — €) + logsA]. 


It follows that any sequence of such networks, growing so that p is 
bounded away from zero, will grow like N log N. For then AN is 
bounded below, and (27) implies 


X(v) = 2pN logaN+ O(N), as No. 


XXil. FRAME CONCEPT 


The second kind of network structure we shall study is called a 

frame. It is familiar to engineers from the No. 5 and earlier crossbar 
systems. The idea of the frame is to mount all N terminals on a side in 
k groups of N/k on subnetworks that are connected pairwise by 
dedicated junctor groups. In the two-sided case shown in Fig. 3, these 
connections are described by a complete bipartite graph. We shall 
suppose that the inlet subnetworks are N/k by kc and identical, and 
are mirror images of the outlet subnetworks, with c trunks or junctors 
connecting each pair of inlet-outlet subnetworks. A solvable limiting 
case results when we make the subnetwork nonblocking, so that loss 
is due always to overload of one of the groups of c junctors between a 
pair of subnetworks. 
Remark: It is tempting to conjecture here that in the natural “weak” 
limit N —» 0, \—> 0, AN? = a, a constant, the loss for the frame network 
with nonblocking concentrators will approach the Erlang loss 
E(c, ak~*) for c trunks and Poisson traffic ak~. 

It is not hard to see that if the subnetworks in Fig. 3 are nonblocking, 
then to define the transition rates that devolve from the stochastic 
model it is enough to know how many trunks are busy in each of the 
k? dedicated groups of size c. Therefore we can use, instead of our 
usual microscopic semilattice, a very much reduced? notion of state, as 
follows: we describe the system by a k X k matrix x of integers (x,;), 
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0<1,7 =k, with the interpretation that 


xij = number of calls in progress from subnetwork i on the left to 
subnetwork 7 on the right. 


The x; are restricted to be integers between 0 and c, the capacity of 
each (i, 7) trunk group. It is useful to have the notations 
k 


x; = } x; = number of calls in progress on subnetwork 
j=l 


ion left 
xi = > xij = number of calls in progress on subnetwork 


t=1 

J on right (28) 
x*(t, 7) = the state resulting from x when a new call 

is added to the (2, 7) trunk group 
x (i, 7) = the state resulting from x when a hangup occurs 

on the (i, 7) trunk group. 


With px the stationary probability of x, the statistical equilibrium 
equations are 


k k 
| Y xytA Y 1. (N — xi)(N — x) 
ij=1 


ij=1 


k 
= 2 [peri s(ri+lay., + AlsjuoPx s(N — xi + 1)(N — x’ + 1)]. 
Lj= 
Here the indicator functions for x; ~ 0 and x; # c give the right 
equations on the “boundary” of the state space. The solution of these 
equations has the convenient product form 


Nad .(N\(N 

a Lys! ; 

es po Tate’ (N)(N 
iJ 


Pod m @ Pia sles wee, ae) (* > 


where po is the chance that no calls are up, given as the reciprocal of 
the normalizing sum as 


a1 = |x| xi x! N N 
Hla tall’ all) 


The sum over the states x is over all k X k matrices whose entries are 
integers in [0, c], including the identically 0 state, which contributes a 
term 1. 

Under our symmetry assumptions the probability of blocking be- 
tween two outer subnetworks 7 and / is the same as the overall loss, 
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and is given by 
Y Pilz,-c(N — xi) (N = x’) 


ee en Nae 


This formula does not depend on the normalizer po’. Although it is 
exact, it will be of interest to us primarily for the insight it gives into 
the asymptotic behavior of bl as N— ©, \ — 0, with AN? = a, constant. 
To this end it is enough to look at p,/po. Writing it in the form, for 
x #0, and using ; x: = Yj x’ = |x|, 


Nad s-1 /N x/—1 N 
BLD Tee ye (G-") HA (G- ) 
1] 
AN? |x| 
(FF) 
,-x#7! ee hott | fees: 
N/k} N/k ‘N/k )} n/K 


ij xi! 








(the products interpreted as 1 if x; or x’ is 0) one can see that 


: (ak~*)!*! 
lim px = ————_; 
ae ed aaa 
A>0 ij 

AN? =a 


write a = ak~? for simplicity; the probability of loss goes to 


a aitt-e 


c! xyj=e IT Xke ! 
k,t#i,7 


c gt 
aie |x|-n 
ier eR 


n=0 71. xjj=n 


II Xen! 
ktAL7 


It can be verified that for the various n up to c, the 5} summands in 





xian 
the denominator are all equal, and equal to the }} sum in the 
x jj=e 
numerator. Thus, as conjectured, 
a’’ 
c! 
lim bl =- a E(c, a) = E(ec, ak”). (29) 
0 — 
XN2 n=0 n! 


— =a 


bk? 
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Table |—Load loss relationships 


Loss Load 
Central bus E(e, a) a[l1 — E(c, a)] 
Frame E(c,ak~) afl — E(c, ak~)] 


As for central bus networks, one can get linear growth in the weak 
limit, described as follows: 

Theorem: For every « > 0, there is an integer c and a growing 
sequence vy of two-sided frame networks on N terminals, with N/k 
subnetworks on a side, such that as N—> ~,\ > 0, and a =AN? = 
constant, 


(t) bl(vy) > E(c, ak~’) Se 

(ii) m(vy) > all — E(c, ak~*)] 
(tit) X(vn) S 2R[68N logec + O(c)] (30) 
(tv) s(vn) = 4(1 + logeke). 


Remark: The reader can check that it is quite proper to regard the 
frame network with k concentrating subnetworks on a side as a system 
of k? central bus networks: the loss is asymptotically Erlang E(c, -) in 
both cases, and the carried load for the frame is k? times the carried 
load on the corresponding central bus, as shown in Table I. 

Proof of Theorem (30): This proof is very much like that of (24). We 
chose c to be the least integer such that E(c, ak~’) < «, and construct 
nonblocking N/k X ke concentrators for the subnetworks, each using 
no more than 68(N/k)loge(kc) + O(kc) crosspoints and 2(1 + logekc) 
stages. Convergence of the loss has been proved as (29), and that of 
the load follows as usual from the Erlang formula (1), as in (24). 


XXIll. REMOTE CONCENTRATOR CONCEPT 


A third network structure worth looking at consists of concentrating 
subnetworks each connected only to one and the same central “core” 
network by a group of c high-usage trunks, as in Fig. 4. We call this 
structure, well-known to traffic engineers, the “remote concentrator 
concept;” it represents an extreme form of the advice to separate 
concentration and distribution in the network. We shall suppose that 
the subnetworks are divided into two groups, one carrying the inlets, 
the other the outlets, so that the whole network is still two-sided. 
When the outer subnetworks (concentrators) and the inner core are 
all nonblocking, a useful solvable case results, and we can again guess 
that as the right limit is taken there will be some connection with 
Erlang’s EF function. Since the success of a call attempt depends 
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entirely on finding first a free trunk into the core, and then one going | 
out from it to the destination concentrator, it is tempting to guess 
that asymptotically the loss is given by the “blocking polynomial” 
1 — (1 — b)?, where b is the chance that all c trunks in a group are 
busy. This simple guess is probably not true, because of the correlation 
between loads on trunk groups; it may nevertheless be a bound, 
although we have not been able to prove this: the question whether 
the blocking owing to simultaneously full groups is larger or smaller 
than 6” is open. We can, however, give Erlang E bounds on both kinds 
of blocking, as well as exact loss formulas for finite N in terms of 
logarithmic derivatives of a partition function. 

Let k be a divisor of N, fixed henceforth, to be interpreted as the 
number of concentrators on the inlet side of the network, each with 
N/k inlets, c trunks to the core, and nonblocking. The outlet side is 
similarly constituted. As a notion of state we can take the matrices 
x = (xy, 0 = 1,7 = k) with the meaning 


xi; = number of calls in progress 
from inlet concentrator z to outlet concentrator /. 


These matrices are subject to the condition that both rows and 
columns must not sum to more than c, the trunk group size. This is 
the same notion of state used for frame networks, except that there 
the entries had to be at the most c, while here the row and column 
sums are thus bounded. 

Using the same notations (28) as for the frame networks, we can put 
down the following equilibrium equations: 


k k N ; 
Dx >; Xi + rN »y lyccxi<e & ae “| 
tj=1 i,j=1 k 


k 


N N ‘ 
= > | eruntau + Mlsjce + Alz,p0Px- i,j) (7 x] (7 = 1) | 


ij=1 


Again, the indicator factors give the right equations at the boundary 
of the state space, and the solution has the same product form as for 
frame networks: 


k : 
_ |x| x x/ N/k N/k 
Dx PDoA ie os eee ; va) (., eee ’ w)A Xi x! > 


where po is the chance of no calls in progress, the reciprocal of the 
normalizer 


x Ar 7 Xi x N/k\ (N/k 
zeS ja \Xin, o> , Xie] \ Hy, +++ Xe] \ Xi oa a 


The sum over the states x € S is over all k X k matrices with 
nonnegative integer entries, and row and column sums at most c. 
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By symmetry, the probability of blocking between two remote 
concentrators i and 7 is the same as the overall loss, and is given by 


N N . 
; aes Ss __ — x 
2 Pillinc A Ixinc] (7 x\(3 x ) 


EoG-*)(E-*) 


This formula depends only on the known ratios p;x/po. We examine its 
asymptotic comportment as A > 0, N > ©, with AN? = a, constant. 
The argument for (29) gives 

a [x| 

Rk? 


lim p:/po => 
Noo 


A—0 II ; xi! 
AN?2=a i,j=1 


bl = 





XXIV. THE PARTITION FUNCTION 
We have, for the remote concentrator concept, 


Ix| j 
Pz = pod i a BEY va) (ss, re ws) ( * si 


= poA!*!e(x). 
Hence, introducing the generating function 
ke 
oy) =1+ Vy’ Y efx), 


j=l [xl 
xES 


the moments of the number of calls in progress can be expressed as 
logarithmic derivatives; especially, 


d 
m= AH los (A) 


2 


d d 
— 2 a 
o=)X ae log (A) +A TA log #(A), 


and by the generalized Erlang formula (1), the blocking is determined 
via 
1 m 
—- bl =—-——_—_ 
: A (N—m)? +0? 
d 
N log $(A) 


d f d? d 
E A oh log 60 | +A ae log (A) +A an log (A) 
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The interested reader can verify that analogous results hold in the 


Erlang case of c trunks offered traffic a: the generating function is 
2 





— y- eoe y* 
tO) Sty tS Pee 
so that 
oe _ ¢(a) 
1-bl=1 EC a) 


d 
m=a(1— E(c, a)) =a aa log o(a). 


The essential form of these relationships persists in the weak limit 
A— 0, N— ©, AN? = a = constant. We write 


k j ! 1N—21z1 
pe = pool”! Ol ( xi )( x (N/R)\(N/R)IN 
ij=l \Xi1, 22° Xin] \X1j, oe? 5 XR N \ N ph 
Pac hel ad ha 
and take the limit as above. Stirling’s formula implies that the partition 


function becomes 


ke 
o(y) =1+ Py oe +S oe 


[x|=¢ 
Xi x/ 
k Xi, 29% y XipJ\ Xj, 02° , XR 


row sums <c 
column sums =c 

Wy ay 

ij=l xilx?! 


Then since (N — m)? + o° ~ N? in the weak limit, one finds 


m— a+ log (a) (32) 
1— Obl oy (a) (33) 
or og (a). 


Theorem: For every integer k and every € > 0, there is an integer c, 
and a sequence vn of “remote concentrator’ networks, with c trunks 
from each of k N/k X ¢c concentrators on a side to a central core 
ke X ke, such that as} > 0, N— », AN? = a= constant 


(i) bl(vy) > 1- < log ¢(a) <e 


‘ d 
(ii) m(vn) > a log $(a) (34) 


(wit) X(vn) = = [68c logee + O(c)] + 68kc logeke + Of(Rc) 
(tv) s(yy) = 6(1 + logec) + 2 logok. 
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Proof: Again, this proof is like that of (24). Using (32), the blocking 
can be written as 


a 


j 7 1 
XN? y, pe(1-2)(1- 5) a+ o(1) 


=l|- = log (a) + o(1). 
So with AN? = a, fixed, pick the integer c in the partition function 
¢(y) defined by (31) so that 


d 
1- Faq 18 (a) <e. 
This is possible because the limit blocking must decrease to 0 with 
increasing c. Thus, (32) proves (Z) and (iz); (iii) and (iv) follow by the 
same kind of concentrator construction as before, using the method of 
Bassalygo and Pinsker.® 


XXV. BLOCKING INEQUALITIES FOR REMOTE CONCENTRATOR 
CONCEPT 


When the concentrating subnetworks and the core are nonblocking, 
it is possible to derive some interesting Erlang E bounds for the 
blocking in a remote concentrator structure. We first note that the 
contributions to blocking are of two kinds: a call is blocked between 
subnetworks i and j because x; = ¢, or x’ = ¢, or both, ie., x; + x’ = 2e. 


Thus, 
N N ; 
2 nF = x) (F = (ace + lgiee i 1,+2/=2¢) 


N Ns, 
zee *)(5-*) 


= > Px(1x,=c + [zinc — 1y,+2/=2c) + o(1). 


xeS 


By symmetry the 1,-. and 1:/-. terms contribute the same amount, so 
the problem of estimating the blocking reduces to estimating: 
(i) the probability Pr{x; = c} =)iz-c px of having “all trunks busy” 
on concentrator v, and 
(ii) the “double trouble” term Pr{x; + x’ = 2c}. 
It is convenient to define, for states x € S, and1<i,j=R, 


s(x) = (1 a 1x,=c) (1 Dae 1xj=c) (Z a x) (=z = v/) 


This is the number of unblocked idle inlet-outlet pairs (u, v) with u on 
concentrator i and v on concentrator /. 
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Lemma: For integers 0 S t S w = maxxzes|x|, the chance of t calls up 
on concentrator i can be represented as 


t t-1 


Pr{x: = ¢} = Pr{xi= 0} 5 nz (35) 
1 ¢= 


where 


k 


— N-2¢ Dx ij 
1 a 2 Pris = ¢} » e (x) 


= N~*E {number of unblocked calls on i|Zi-trunks are busy} 
=k". 


Proof: This follows from the form of the statistical equilibrium equa- 
tions, which says that the rate into a set is the rate out of it. We use 
the sets {x € S: x; = t}, which partition S and communicate by pairs 
in a simply ordered array except at the endpoints 7?= 0 and ?= c. The 
result follows by iteration. In a similar way it is found that 


Lemma: 
. ; a’ t-1 
Pr{x; + x’ = t} = Pr{x; + x’ = 0} ri IT &,, (36) 
. 
where 
2¢ Dp x 
= Zz ee ee im _ 2s. mj 
&; N ne Pr{x; + x = ¢} >» [s (x) 7 (1 Sim) Ss (x)] 
N~“E{number of unblocked calls to i or i|x; + x’ = 7} 
2k —-—1 
an 
Theorem: 


Pr{x:i = c} S E(c, ak~*) 


Pr{x; + x/ = 2c} = E(2c, 2ak™ — ak~’). (37) 
Proof: Introduce the function on the positive orthant 
les N 
FY +++ Ye) = ———__ 
L+yityiyot ers $yiyoes+ ve D 


this is increasing in each y; there, since 


Of NL +y2t+ yoyo ++ +192 ses Yew ce 
Oy: yiD? ; 


By normalization, )\-o Pr{x; = t} = 1, so Lemma (35) says that 
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a a 
Pr{x; = c} = j(am 3 My, °°", = m=} 


< f(a e, ae 4 = E(c, a/R). 


The proof for the “double trouble” term is analogous, from Lemma 
(36). 


XXVI. CONCLUSIONS AND PROSPECTS 


For the narrow class of probability models for telephone networks 
described by “finite sources, exponential holding times,” we have 
shown that the N log N rate of growth (of the number X of crosspoints) 
characteristic of nonblocking networks extends also to those with 
blocking. This narrow class provided easy methodological devices for 
carrying out the proofs. Extensions to more general statistics have 
been mentioned in an interesting series of papers by N. Pippenger, 
listed in the bibliography. However, his results are either combinatorial 
or restricted to Markovian models similar to ours. Since some of his 
principal demonstrations depend on what amounts to the old “lost 
calls held” convention applied to finite sources, his results are strictly 
not comparable to those given here. Extensions to the distribution-free 
context remain to be made. As Pippenger suggests, the most useful 
tools are likely to be the entropy concept and ergodic theory. 
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Upper and Lower Bounds on Mean Throughput 
Rate and Mean Delay in Memory-Constrained 
Queueing Networks 


By E. ARTHURS and B. W. STUCK 
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Operators use terminals to enter transactions into a system and 
then wait for the system to respond. The system contains serially 
reusable resources, and can hold a maximum number of jobs. Each 
job requires a total mean amount of service at each stage. We 
calculate upper and lower bounds on the mean throughput rate and 
mean delay as a function of model parameters, and present examples 
that show these bounds are sharp, in the sense that they are achiev- 
able given only mean values. We also present partial results for 
closed queueing networks where the long-term, time-averaged distri- 
bution of number of jobs at each node in the network obey so-called 
product form separation of variable type of probability distributions. 
Examples and data from actual systems illustrate the utility of the 
work. 


|. INTRODUCTION 


At present there is great interest in modeling the traffic-handling 
characteristics of computer and communication systems using 
queueing networks.’* The change in cost of electronic solid-state 
circuitry” and rising personnel costs”® offers strong incentive to design 
cost-effective digital systems. 

Computer communications systems can often be modeled quite 
naturally by a network of queues, where a job receives service at one 
stage or queue and then migrates to another stage, until it is completely 
serviced. Examples of actual systems and associated models are pre- 
sented in later sections of this report. This class of models captures 
several fundamental phenomena of such systems, including asynchro- 
nous and concurrent execution of different jobs and different amounts 
of service required at each stage of execution. To answer whether this 
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type of model is valid, controlled experimentation and measurement 
must be carried out, and goals or criteria must be set for judging 
goodness of fit. Finally, one would like to use these models to predict 
or extrapolate behavior into unknown regions of operation to guide 
decision making. 

Here we focus on one technique for bounding the mean throughput 
rate and mean delay of an abstraction of a computer communications 
system. This is only one factor among many others, such as cost, 
flexibility, and reliability, that must be considered in choosing one 
design over another for a given application. We drop these other 
factors from consideration after this point in the interest of brevity. 

Broadly speaking, there are two reasons for wanting to quantify the 
traffic-handling capabilities in a computer communication system: 

(t) Cost reduction of an existing service or product: 

(a) In an existing system, it is often possible to modify existing 
scheduling policies to improve performance at an acceptable 
cost. An example would be to change from one memory 
partition per application program to a memory pool shared 
among all application programs. 

(b) In a system handling a fixed set of job types, different 
equipment configurations can accomplish these jobs at dif- 
ferent costs. Which should be chosen? An example would 
be to compare using two slow disks versus one fast disk. 

(tt) Comparisons are often desired between current operations and 
wholly new modes of operations. An example would be using an 
existing batch computer system for time sharing, using the existing 
time sharing system for electronic mail, or using existing word proc- 
essors for voice-annotated text services. 

To quantify these issues, typically two stages are involved: the first 
is synthesis, where goals are stated along with different alternatives 
for reaching those goals, while the second is analysis, where the 
performance (here the mean throughput rate of finishing jobs and the 
mean delay for each stage of job execution) is quantified. Goals may 
be either oriented toward the total system, such as total number of 
jobs of a given type that are handled during an hour, or toward an 
individual, such as the mean delay to handle one or more stages of a 
job; along with goals such as these there should be some measure of 
the sensitivity of the goals to different operating points, and so forth. 

Analysis often begins by postulating a set of parameters that carry 
or capture specific operational aspects, drawing inferences based on 
these parameters (either by mathematical analysis or by discrete event 
simulation),”"" measuring actual or simulated operation, and then 
repeating this process until it is felt that additional work is no longer 
warranted. 
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Our goal here is to demonstrate how to carry out part of this process 
by a straightforward technique for obtaining bounds on mean through- 
put rate and mean delay, given only mean value information for the 
service required at each stage of job execution. In our opinion, there 
are three principal contributions: 

(z) Anew technique for obtaining a Jower bound on mean through- 
put rate and an associated upper bound on mean delay. Earlier workers 
(e.g., Ref. 1, pp. 212-25) obtained an upper bound on mean throughput 
rate and an associated lower bound on mean delay. Furthermore, we 
present an example that shows that given only mean values for the 
amount of service required at each visit to each stage in the queueing 
network model, either bound can be approached arbitrarily close, 
depending upon on the amount of fluctuation present about the mean 
service times. This shows that these bounds are sharp, much as was 
done earlier for loss systems.’””® The interested reader is referred to 
related works.‘ 

(zit) A new technique for calculating both upper and lower bounds 
on the mean throughput rate and mean delay for a class of closed 
queueing networks whose long-term, time-averaged distribution for 
the number of jobs in system obeys a so-called product form or 
separation of variables decomposition”® (for an application case study, 
see Ref. 16). The upper bound on mean throughput rate is the recip- 
rocal of the total mean time to execute the transaction plus the 
average time spent in execution per node, while the lower bound on 
mean throughput rate is the reciprocal of the total mean time to 
execute the transaction plus the maximum mean time spent in exe- 
cution per node. The tightness of these bounds will thus depend on 
how close these factors are to one another (Zahorjan et al., developed 
these results independently; "’ our derivation is felt to be more straight- 
forward). 

(zt) Data from controlled experimentation on actual computer and 
computer communication systems is presented to validate the ap- 
proach presented here. 

The examples presented are deliberately elementary, chosen for 
tractability. Everything of interest can be represented by formulas. 
Furthermore, this approach is a natural starting point for virtually any 
study of traffic-handling performance, can be refined in a variety of 
ways, used to check and bound much more complex analyses or 
simulations, and can be immediately related to measurements in an 
actual system. Often data are simply not available to describe the 
arrival statistics and service required for each step of each job, such as 
would be needed in simulation studies; this suggests using a mean 
value (distribution free) analysis, rather than more stringent distribu- 
tional assumptions, and then assessing performance sensitivity by 
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varying the mean value, rather than investing effort in simulation 
studies. We advocate synthesis via analysis of the performance of a 
given configuration. The approach adopted here is not exhaustive, but 
it is fundamental. The examples show that only two avenues are 
available for improving computer communication system performance, 
reducing the time to handle a given task (i.e., speedup) and handling 
two or more tasks simultaneously [i.e., concurrency, either real (mul- 
tiplexing multiple resources) or apparent (via scheduling a single 
resource) ]. 


ll. A MODEL OF A PROCESSOR AND DISK SYSTEM 


In this section we deal with a mathematical abstraction of an on- 
line transaction processing system. 


2.1 Model description 


Clerks at terminals spend a mean amount of time reading, thinking, 
and entering the transaction, and then wait for the system to respond 
before repeating this cycle. Each transaction requires a mean amount 
of processor time and disk secondary storage access time to be com- 
pletely executed. The system is configured with a finite amount of 
memory, and hence can hold a maximum number of jobs at any one 
time. Figure 1 shows a hardware block diagram of the system. The 
cycle that a job or transaction follows can be described by a path 
through a network of queues. The first stage or queue is associated 
with operators at terminals entering each transaction. Next, each job 
enters a staging queue, where it waits if there are already the maximum 
number of allowable jobs in the system, and otherwise it immediately 
enters the system. Once inside the system, a job will receive some 
processing, then require accessing some data from secondary disk 
storage, then some processing, and so forth until it is completely 
executed. Finally, control will return to the operator at the terminal 
and the process begins anew. Figure 2 shows a queueing network block 
diagram for this system, consisting of four queues: one for operator 
jobs, one for staging, one for processors, and one for disks. 

The ingredients we need are 

(t) The number of clerks, C, actively submitting transactions to 
the system 

(tt) The number of processors, P, and disks, D, connected by a 
common switch. (The switch is assumed to be much faster than any 
step of job execution involving either a processor or a disk, and is 
ignored from this point on.) 

(tii) The maximum number of jobs allowed inside the system at any 
one time, 
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Fig. 1—Block diagram of the system hardware. 


(tv) The mean time each operator spends reading, thinking, and 
entering each transaction, denoted Tink 
(v) The mean processor time, T processor, and mean disk access time, 
Taisk, per transaction. 
The outputs of the analysis are the mean throughput rate of executing 
transactions and the mean response time (as seen by operators, in- 
cluding both execution time plus waiting time) as a function of model 
parameters. No job is assumed to be capable of executing in parallel 
with itself. The operating system multiplexes available jobs among 
available processors and disks to achieve some degree of concurrent 
use of resources. 
From the vantage point of an operator at a terminal, we see that 
each transaction undergoes two stages of processing: 
(t) A stage spent preparing the transaction for execution, with 
mean time interval, Think 
(it) A stage spent waiting for the transaction to execute, with mean 
time interval, FR. 
For one operator at one terminal, the mean cycle time per transaction 
is simply the sum of the mean preparation time and mean delay. 
Hence, when C operators are active, the mean throughput rate equals 
simply C times the mean throughput rate for one operator: 
C 
mean throughput rate = A = Tous R 
If we rewrite this equation to find the mean response time, we see 
_ C 
~ mean throughput rate 
These two relationships will be fundamental in determining feasible 
operating regions for mean throughput rate and mean delay. 


= T inink: 
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Fig. 2—Block diagram of the system queueing network. 


2.2 Mathematical problem statement 


The system state space is denoted by 2, where 
Q = {(Soperator, stages Jprocessors Jaisk) | 
Soperator + Sstage + Iprocessor + Saisk = C; S processor + Saisk 
= min[M, C — Joperator]}. 


At any instant of time, the system is in a state given by (Joperator, Jstage 
J processor, Jaisk). Hach component is integer valued, and refers to the 
number of jobs or transactions either in execution or waiting to be 
executed at that stage. The admissible state space is constrained 
because 

(t) There are at most C jobs being worked on at any one time 

(it) The system can hold at most M jobs at any one time. 
In a later section, we will show that the mean number of tasks in 
execution with operators, processors, and disks, averaged over a suit- 
ably long time interval, equals the mean throughput rate, A, multiplied 
by the total mean execution time for that stage. This is summarized in 
the following equations, where H(-) denotes a time average of the 
argument: 


Emin (Soperator, C)] = AT think 
E [min (processor; fg )] = AT processor 
Elmin(Jaisk, D)] = AT aisx. 


In a later section, we show that A can be upper and lower bounded, 
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given only this information, in terms of model parameters, as follows: 


Niowet = A = Anbper 


Cc 
Niower = C G 
Peni + ————— T processor & —— Frais 
me min(C, M, P)” min(C, M, D) °° 
Be ne eh min(C, M, P) min(C,M,D)  min(C, M) 
eee T processor : T disk : T rrocessar + Tie 


C 
Tinink + T processor se T disk i 


Each of the upper bounds on mean throughput rate has a physical 
interpretation, as follows: 

(t) The processors are limiting the maximum mean throughput 
rate 
min(C, M, P) 


TD orccekiak 


Aapper = 


(ti) The disks are limiting the maximum mean throughput rate 
a) 
Taisk 
(itt) The clerks are limiting the maximum mean throughput rate 
Cc 


pA een ee oR a cma 
“oe T think a T processor + T aisk 


(tv) Memory is limiting the maximum mean throughput rate 
min(C, M) 
T scowéssar = 5 T disk ; 


Navies = 


The lower bound on mean throughput rate has the physical interpre- 
tation of executing jobs one at a time on each processor/disk pair. The 
upper bound on mean throughput rate is associated with the best 
possible concurrency, while the lower bound on mean throughput rate 
is associated with the worst possible parallelism. The upper bound on 
mean throughput rate yields a lower bound on mean delay; the lower 
bound on mean throughput rate yields an upper bound on mean delay: 


c —- Ton =R= 


rae T think . 
Aupper Nlowet 








These bounds define an admissible or feasible region of operation and 
are plotted in Figs. 3 and 4 for the case of one processor and one disk, 
versus C = M. 

Two regimes are evident: a lightly loaded regime, where the number 
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Fig. 3—Mean throughput rate versus number of active terminals. 
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Fig. 4—Mean delay versus number of active terminals. 


of clerks is directly proportional to the mean throughput rate and the 
mean delay is independent of the number of clerks, so the clerks are a 
bottleneck; and a heavily loaded regime, where the on-line computer 
communication system is the bottleneck, with the mean throughput 
rate independent of the number of clerks and the mean delay directly 
proportional to the number of clerks. 

If we vary the number of clerks, there is a natural breakpoint 
between these two regimes: 


breakpoint number of operators = Aupperl T processor + Taisk + Think]. 


As long as the number of clerks is well below this breakpoint, the 
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clerks and not the system will be limiting the mean throughput rate, 
and the mean delay per transaction will be approximately the mean 
delay to execute a transaction with no contention. With the number of 
clerks well above this breakpoint, the system will be limiting the mean 
throughput rate, and the mean delay per transaction will be well in 
excess of that time to execute a transaction with no contention. 
Analysis suggests measurements to determine where these two regimes 
lie; synthesis would involve choosing which regime we wish to operate 
in (remember, we will always have some bottleneck!) and designing 
the system accordingly. 


2.3 Impact of memory constraint for one processor and one disk 


Here are two possible scheduling policies for a system with one 
processor and one disk: 

(t) Only one job is allowed into the system to be executed at any 
one time. This is called single-thread scheduling, and corresponds to 
the maximum number of jobs in the system equal to one, M = 1. 

(tt) More than one job is allowed in the system to be executed at 
any one time. This is called multiple-thread scheduling, and corre- 
sponds to M> 1. For M = 1 we see 


C 
ai A 
T think +C (T processor 5 Taisk) 7 ‘sinse oe 
; C 1 
= Cer + T processor za Tae T processor so cl ceca 


If we allow multiplexing of the processors and disks amongst transac- 
tions, then M > 1 is allowed, but now one or the other of the two 
serially reusable resources will become completely utilized for M 
sufficiently large: 

C ae 
[ee Sa 6, (/ EE ay | + C ( je + T sian) — multiple thread 


smin (Fea re) 


Think ++ fee oe Taisk 7 processor Taisk 


In either case, the lower bound on mean throughput rate is identical, 
but the upper bound can be different, owing to different bottlenecks: 
(z) The number of clerks is a bottleneck 


C 


Asingle thread» Amulti st SS ae 
ple thread — 
Tihink + T processor a Taisk 


(zz) The processor is a bottleneck 
1 


Amuliiple thread = 
f 5 eee 
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(tit) The disk is a bottleneck 
1 
Taisk 
(tv) Memory is limiting the maximum mean throughput rate 
1 
T processor + Tisk 





Amuttiple thread = 


Asingle thread = 


Provided that the clerks or operators are not limiting the maximum 
mean throughput rate, the ratio of the two different upper bounds is 
an indication of the gain owing to scheduling or allowing more than 
one job inside the system at any one time: 


Amultiple thread _ tT iroceaaat + Taisk 
Asingle thread max(T processor» T disk) 


For one processor and one disk, this gain owing to scheduling can be 
at most two, no matter what Tprocessor OF Taisk are! Moreover, this will 
only be achieved when Tprocessor equals Tuisk, but in general these two 
mean times will not be equal and hence the gain will not be as great as 
a factor of two; for example, if Tis, were ten times as great as Tprocessor; 
then the gain would be at most ten percent, and other factors ignored 
in this analysis may swamp this. 


2.4 Asymptotics 


We close with an investigation of asymptotic behavior of this system. 
One type of asymptotic analysis is to let all parameters be fixed except 
one, and the final one becomes progressively larger and larger. Here a 
natural candidate for such a parameter is the number of operators or 
jobs circulating in the system C. As the number of operators or clerks 
becomes large, C — ©, we see 


1 


TE piecessor Tain 


min(P, M) min(D, M) 


in ( P D M 
= min | ———_- ——_——_——-}], C> », 


2w7? 
T processor T aisk TD igceator be Taisk 


=2X 


This in turn will yield upper and lower bounds on mean response time: 








( c ~ Tain) > 09 = R= ( e ~ Tai) + 99 C2 
Mower upper 


In other words, the mean throughput rate lies between two finite 
bounds, while the mean response time is infinite (will exceed any finite 
threshold as we add more and more clerks). 

A second type of asymptotic analysis is to fix the ratio of two 
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parameters, and allow them both to become progressively larger, 
holding all other parameters constant. Here, a natural candidate is the 
ratio of the number of jobs over the mean think time per operator, 
which we denote by a, which is a measure of the total offered rate of 
submitting jobs: 





a= ,C—> ©, Think > ©. 


think 

We allow the number of jobs or terminals to become large, as well as 
the mean intersubmission time of jobs from each terminal, thus weak- 
ening the contribution to the total offered rate of each terminal. In the 
literature, an analogous procedure is called passing from the so-called 
finite source to infinite source arrival process (e.g., see Ref. 18, pp. 
102-3), granted certain additional distribution assumptions that we do 
not make here (e.g., see Ref. 18, pp. 80-2). If we fix a while allowing C, 
Ttnink > ©, We see 


Qa 


San KS eS | =X 
1a processor z. disk 


min(P,M) min(D, M) 
: ( P D M ) 
<= min (————_ = I. 


Tioscassoe Tass Cprsscsion Laide 
This in turn yields the following lower bound on mean delay: 
1 
aca T processor Taisk M | 
min(P, M)’ min(D, M)’ Tprocessor + Taisk 


00 a> 


R= T processor o TL aiats 
1 


max T processor Taisk T processor a Taisk 
min(P, M)’ min(D, M)’ M 


a< 


The remaining case, an upper bound on mean delay or mean response 
time, is trivial: 

Rs, a fixed, C—> ©, Think > ©. 
Additional (distributional) information must be available to allow us 


to handle the case where 
1 


as T processor Taisk T processor + Taisk 
min(P, M)’ min(D, M)’ M 


Inituitively we see that if the total mean arrival rate is less than the 


a= 
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upper bound on the mean throughput rate, then the system is capable 
of having a finite lower bound for mean response time; when the total 
mean arrival rate is greater than the upper bound on the mean 
throughput rate, then the mean response time lower bound is infinite. 
Note that the mean throughput rate lies between two finite limits, 
while the mean response time can lie between a finite and infinite 
value, given only mean value information, i.e., the mean response time 
is not well bounded given only this amount of information. This is well 
known in other types of queueing systems, such as the M/G/1 system 
(e.g., see Ref. 18, pp. 189-92), where the mean delay depends not only 
on the first moment of the service time distribution but also the second 
moment of the service time distribution: mean value information does 
not specify the mean delay in such systems by itself, but rather we 
need the actual distribution of service time to deal with this issue. 


ll. PROTOTYPE DIRECTORY ASSISTANCE SYSTEM CASE STUDY 


Here is a case study in using these techniques. A prototype of an on- 
line transaction processing system was built to handle telephone 
number directory assistance queries. In a typical cycle of operation, a 
person at a terminal would 

(t) Receive a query from a customer via voice telephone 
(itt) Enter the given information into a computer terminal while 
talking to the customer 

(tit) Wait for the system to respond with the answer to the query 

(tv) Tell the customer over the voice telephone the reply 

(v) Close out customer interaction 

(vi) Wait to receive the next customer query. 

The hardware configuration for the system consisted of C terminals, 
a single processor, a single disk controller, and a single disk spindle. 
An operating system coordinated scheduling and management of these 
devices, while a set of prototype application programs handled trans- 
action processing. 

Measurements on the prototype system in operation showed that 

(t) The mean time spent by a person talking, reading, and thinking, 
denoted by Tinink, was twenty seconds 
(tt) The mean processor time per transaction was broken down 
into three sets of application programs 
(a) The operator interface front-end programs consumed 180 
milliseconds of processor time per query on the average 
(b) The index manipulation application programs consumed 
420 milliseconds of processor time per query on the average 
(c) The data retrieval application programs consumed 330 
milliseconds of processor time per query on the average 
(d) Miscellaneous application programs that were invoked for 
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accounting, system administration, and other purposes con- 
sumed one hundred and forty milliseconds (140 ms) per 
query 
Hence, the total mean processor time per query, Tprocessor; WAS 1.07 
seconds 

(iit) The mean number of disk accesses per query was twenty six 
(26), with the disk capable of making one access every twenty five 
milliseconds (25 ms), which results in a mean time the disk is busy per 
query, denoted Tuisx, of six hundred fifty milliseconds (650 ms). 

The above measurements on total mean processor time and disk 
access counts were based on examining the mean resources required 
for one hundred different transactions to the system; the measurement 
error on the processor time was felt to be under ten milliseconds, while 
the measurement error on the number of disk accesses was felt to be 
under one access. For this level of analysis, the upper and lower mean 
value bounds on mean response time are given by 


C 


max T ntocessat Go Taisk, 
max(T processor» Taisk) 


_ Tos 


=Rs C( T processor a Taisk); 


while the associated upper and lower mean value bounds on mean 
throughput rate are given by 


C 
ZA 
Think + CUT eco a Taisk) 


ee C 1 
= aS OEE ke 
Think + f ese + Tadic max(T processor, T disk) 


These bounds are plotted in Figs. 5 and 6, along with observed data 
gathered over an eight-hour time interval with twelve C = 12 operators 
and calculations based upon a closed queueing network model obeying 
product-form-type solution. The goodness of fit of the closed queueing 
network model to actual data was felt to be acceptable for the purposes 
at hand; the mean value lower bound on mean delay and upper bound 
on mean throughput rate were also felt to give an indication of 
performance limitations at an early stage of development, which the 
data gathering and refinement via a closed queueing network model 
only strengthened further. Note that the system is achieving a great 
deal of concurrency, because the actual mean throughput rate is much 
closer to the upper bound, not the single-thread lower bound. Similar 
observations hold for mean delay. 


IV. PROTOTYPE DATA BASE ADMINISTRATION SYSTEM CASE STUDY 


A transaction processing system administers the data base for a 
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Fig. 5—Mean throughput rate (bounds and data) versus Toprocessor- 


second system that switches telephone calls; hence this system is called 
a front-end system to the back-end telephone call switching system. 
Transactions involve additions, deletions, and changes to existing 
telephone numbers in the switching system files. A prototype system 
had a hardware configuration consisting of a single processor, a single 
disk controller, and a single disk spindle, with a fixed number of 
asynchronous terminals. This same prototype had an operating system 
to coordinate and schedule these resources, while application programs 
handled the transaction processing. The application programs were 
structured into a front end for handling operator terminal interactions, 
a data base management system, and a back-end communications 
system for interacting with various switching systems. 

The same formulas for upper and lower mean value bounds on mean 
response time and mean throughput rate hold as in the previous 
example, except for a change in the numbers. 

Two sets of measurements were gathered, one at the start of per- 
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Fig. 6—Mean delay (bounds and data) versus Tprocessor. 


formance analysis, labeled initial in Table I, and one after completing 
two months of performance analysis, which involved recoding appli- 
cation programs to take better advantage of operating system features, 
with the same hardware configuration, labeled final in Table I. Mea- 
surements were carried out in a controlled environment where the 
actual hardware, operating system, and application programs were 
used, but the operator behavior was simulated by a second computer. 
The behavior of each operator was modeled by a script, involving a 
time for reading, thinking, and typing, followed by a time waiting for 
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the system to respond. After an initial startup transient the measure- 
ments of response time were quite predictable for all operations, with 
the measurement error being one second at most. Each operator 
submitted tens of jobs, and the results were averaged over all operators 
and all jobs, so the final statistics were felt to be statistically repro- 
ducible, to within a fraction of a second. 

Figures 7 and 8 plot the mean value upper and lower bounds as well 
as data from these measurements for the mean response time and 
mean throughput rate as a function of number of operators. The 
goodness of fit to mean value bounds was felt to be acceptable for the 
purpose. Unlike the first case study, the data here clearly shows that 
a great deal of fluctuation was encountered in system operation under 
load: for the initial system, the fluctuations were so great that the 
system apparently was always executing only one transaction at a 
time, while for the final system, as load built up, the system effectively 
moved from a regime of two tasks making use of both serially reusable 
resources to a regime where only one task at a time was in execution. 
This is in contrast to the other set of data, where the system is always 
achieving a great deal of concurrency under load. A closed exponential 


Table |1—Prototype system measurements 


Quantity Initial (seconds) Final (seconds) 

Tenink 15.0 15.0 
processor 8.2 3.5 

Taisk 5.0 0.5 
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Fig. 7—Mean throughput rate (bounds and data) versus number of active clerks. 
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Fig. 8—Mean delay (bounds and data) versus number of active clerks. 


queueing network model of this system would predict behavior that 
closely tracked the upper bound on mean throughput rate and lower 
bound on mean delay, and would simply not allow for sufficient 
fluctuation to drive operation into a mode of operation of executing 
one task at a time. In fact, this suggested a problem with memory 
management that was forcing the system into this mode of operation; 
an obvious test that was not carried out owing to lack of time was to 
add more main memory to see if more concurrency might be achieved. 


V. A MODEL OF FLOW CONTROL OVER A SINGLE LINK 


An on-line communications system consists of operators at terminals 
who send messages to one another. The system consists of a transmit- 
ter and a receiver, with communication channels connecting the trans- 
mitter and receiver. The receiver is capable of buffering only a maxi- 
mum number of messages at any one time, which is a memory 
constraint. 


5.1 Model description 


A communications system is composed of a transmitter processor, 

a receiver processor, a set of buffers each capable of holding one 

message at the receiver, and a noiseless communications link. Here 

are the steps involved in sending a message from the transmitter to 
the receiver: 

(t) The transmitter processes a message. This step has a mean 
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duration Trans at the transmitter, and it requires both the transmitter 
and a buffer at the receiver. 

(tt) The message is sent over the link from the transmitter to the 
receiver. This step has a mean duration T'rans-rec- 

(iii) The receiver processes the message. This step has a mean 
duration Tyec. 

(iv) An acknowledgment of correct receipt of the message is sent 
from the receiver to the transmitter. This step has a mean duration 
Trec-trans. At the start of this step, the receiver marks the buffer free. 

(v) The transmitter processes the acknowledgment. This step has 
a mean duration of Tack. 

At the end of this step, the transmitter marks the buffer free. 

We assume from this point on that the time required by the trans- 
mitter to process the acknowledgment is zero. Figure 9 shows a 
hardware block diagram of the system. Figure 10 shows a queueing 
network block diagram of the system. The system state is denoted by 
{2 where 


Q = {(J; trans » J trans-rec 9 Tree, J, rec-trans) | Ji trans 
Ss Strans-rec = Irec +, rec-trans = B} 


At any instant of time, the system is in a state given by a four tuple, 
(Strans, Jtrans-recy Jrec, Jrec-trans), Where each component is nonnegative 
and integer valued, and the state space constraint is obeyed. 

The mean throughput rate is denoted by A. The mean number of 
jobs in execution in the transmitter and in the receiver equals the 
mean throughput rate multiplied by the total mean execution time, as 
shown in a later section. We denote by E(-) the time average of the 
argument, and write: 


NP sats = E[min(J trans, Fsnd = 1)] 
AT rec = E{min(Jyrec, Prec = 1)]. 


Our goal is to find upper and lower bounds on mean throughput rate, 
subject to meeting state space constraints. 
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Fig. 9—Hardware block diagram. 
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Fig. 10—Queueing network model. 


In a later section, we show 
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The physical interpretation of the upper bound on mean throughput 
rate is as follows 
(i) The transmitter is the bottleneck 
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The physical interpretation of the lower bound is that at most one 
message at a time is being handled by the system. 

Figures 11 through 13 plot these upper and lower bounds, as well as 
the results of an exponential queueing network analysis,’* for the 
special case where 


Lats = Tees Trans $00 = rec-trans 


for three different cases, where the propagation delay is much smaller, 
equal, and much larger than the mean processing time at either end of 
the link. The fraction of time the queueing network model predicts the 
system to be in state J is denoted by 7(J), where 


J, trans-rec Jrec-trans 


I. dT transxee T eckpais 
J =—_ Twxans?™™ ec Tre 
ms ) G ; imnaacee J, eens! 
The system partition function denoted G is chosen to normalize the 
probability distribution: 
yy a(J) = 1. 


SMB 


5.2 Negligible link propagation delay 


We now restrict attention to the special case where the propagation 
delay is negligible compared to the processing at either end of the link, 
from this point on. For one buffer, the mean throughput rate is upper 
bounded by 

1 1 


De ON ga ce eee eee ed 
ts TF gsi =F | ees + o hes + tT recttans 5 Diack f one = L'sée 


There is no concurrency or parallel execution of messages, and the 
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Fig. 11—Line utilization vs. number of buffers (Ttrans-rec = Ttrans/ 10). 
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Fig. 12—Line utilization versus number of buffers (Ttrans-rec = Ttrans)- 
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Fig. 13—Line utilization versus number of buffers (Trans-rec = 10 Trans). 


total time required for message handling is the sum of the individual 
steps. 

For more than one buffer, this will yield an upper bound on the 
mean throughput rate of simply B times the mean throughput rate for 
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one buffer: 
B B 


No Nig a eee 
ae of eres + i ee + Tee + T eecteans + Packs P ea ce f ies 


On the other hand, as the number of messages increases, then either 
the transmitter or the receiver (or both) will become completely busy, 
yielding different upper bounds on mean throughput rate: 
(t) The transmitter is a bottleneck 
1 1 


> ne ree 
ante Divens + f yee 


(it) The receiver is a bottleneck 
1 
7: 


A S Xupper = 





Combining all this, we see 


1 1 
ky = in| —— 
is Tivans + Tick reenter 


B 
Trans i Desnate 5 f eee + f Ee = =) 
: 1 1 B 
<= = — , — ,§ —__—_——_ ]. 
‘ = Aapper cael Coat es T trans a3 z=) 


Increasing the number of buffers from one to two, B = 1 to B = 
always increases the maximum mean throughput rate, and now we see 





1 1 
A= Aupper = min — 7) B> 1. 

Furthermore, this increase is maximized for Tirans = Trec, and then 
the upper bound doubles in going from one buffer to more than one 
buffer. Why is this so? By having more than one buffer, both the 
transmitter and receiver can simultaneously be filling and emptying a 
buffer, allowing greater concurrency or parallelism compared with the 
single-buffer case. We also note that allowing more than two buffers, 
e.g., infinite buffers, will not increase the upper bound on the maximum 
mean throughput rate any further. This is because there are only two 
serially reusable resources, a transmitter and a receiver, so once they 
are concurrently busy, no further gains can be achieved. 

For the lower bound on mean throughput rate, we see that 


B 7 1 
B Trans + B vee - Trans + Tree : 


which is identical to the upper bound for B = 1. Why is this so? There 
may be significant fluctuation about the mean values shown above, 


A = Mower = 
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and in the limit of one big swing about the mean value all of the 
messages will pile up at one stage in the network and nothing will be 
transmitted until buffers become available. 


5.3 Impact of fluctuations 


We now examine one special case of this problem in detail, where 
Tirans-rec = Trec-trans = Tack = 0, and we wish to study the impact of 
fluctuations about mean values on system performance. We assume 
the transmitter processing times are sequences of independent identi- 
cally distributed exponential random variables with mean Trans. We 
assume the receiver processing times are sequences of independent 
identically random variables with common hyperexponential distri- 
bution Greceiver(X ): 


Greceiver(X ) = (1 = a) - a(l = e 7 *rec) , 


In words, a fraction 1 — a will require zero processing time at the 
receiver, while a fraction a will require an exponentially distributed 
amount of processing time with mean 1/prec. The parameter a gives us 
an additional degree of freedom to model fluctuations in the receiver 
processing times. For this case, we choose to fix the squared coefficient 
of variation denoted by C?, which for the random variable X is defined 
as the ratio of the variance to square of the mean (the standard 
deviation, measured in units of mean value, squared): 


variance(X) _ (2 
EX) 


When this is zero, the variance is zero, and there is zero fluctuation 
about the mean. When this is one, we have an exponential distribution, 
where the standard deviation equals the mean. When this is greater 
than one, the standard deviation is greater than the mean. For this 
particular case, we see 0 < a <= 1 and hence 


squared coefficient of variation = 


Cre Z -121. 
a 
If the mean is fixed but a is varied from one (the exponential distri- 
bution case, where the fluctuations are the order of the mean) to zero 
(increasing fluctuations about the mean), with most jobs taking zero 
time but a few taking a very long time, we can gain insight into the 
impact on performance. Since we have fixed the squared coefficient of 
variation, the mean is also fixed, since 


Qa 
rec 


The distribution of the number in the receiver subsystem at the 





sec = 
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completion of processing at the receiver of a message is denoted by 
F(K), K =0, --- , B. Ifnone are left behind, then the mean time to the 
next completion epoch is Trans + Trec. If more than zero are left behind 
at the receiver, then the mean time to the next completion epoch is 
Tec. The mean throughput rate is given by 


1 1 
~ F(0)(Terans + Treo) + [1 — FO) |(Trec)  F(0)Terans + Tree’ 


Once we find the distribution of the number of messages in the system 
at completion epochs, we are done. However, this is a well-known 
result (see Ref. 18, pp. 235-40), and we merely summarize the known 
formulas here for the sake of completeness: 


Q (0) 


B-1 


» Qk) 


_ K=0 


r 


F(0) = 


The terms Q(K), K = 0, ---, B — 1 are given implicitly via the 
following moment-generating function £(X): 


(1 — X)E[eR Fee] 


= < Pa oe Se See 
1X) = 9 Q(x" = Sam x 


Illustrative numerical results are plotted in Figs. 14 through 16 assum- 
ing the transmitter and receiver service times are independent, iden- 
tically distributed, exponential random variables. We note that for the 
special case where Trec = Trans = 1, the mean throughput rate is given 
by 
2(B+1 
aacrse 
A =——————._ C21. 
2(B — 1) 
er a 
C*+1 


Here we see that as C? > that the mean throughput rate approaches 
the lower bound of 1/2 arbitrarily close, i.e., there is no concurrency or 
gain in going to more than one buffer if the fluctuations are too great. 
On the other hand, as B > for C? fixed, the mean throughput rate 
approaches one, which is the best possible. The numerical plots show 
in which regime which phenomenon (the fluctuations or the buffering 
and concurrency) dominates the actual mean throughput rate. The 
impact of speed mismatch (i.e., as the transmitter and receiver mean 
message execution times start to differ) tends to swamp the impact of 
fluctuations: the greater the speed mismatch, the greater concurrency 
is achieved, because the exact mean throughput rate approaches the 
upper bound closer and closer as the speed mismatch increases be- 
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Fig. 14—Mean throughput rate vs. number of buffers for a closed queueing network 
model (Trans = 1.0 and Trec = 1.0). 
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Fig. 15—Mean throughput rate vs. number of buffers for a closed queueing network 
model (Trans = 0.5 and Tree = 1.0). 


E(T7RaNsMITTER) = 0.2 SECOND 0 LOWER BOUND 
E(Treceiver) = 1.0 SECOND O SQ COEF VAR = 10 
A SQ COEF VAR =3 
@ SQ COEF VAR =1 
& UPPER BOUND 


MESSAGES PER SECOND 





0 2 4 6 8 10 12 
BUFFERS 


Fig. 16—Mean throughput rate vs. number of buffers for a closed queueing network 
model (Trans = 0.2 and Teaz => 1.0). 
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tween transmitter and receiver. Note that the upper bound on mean 
throughput rate corresponds to a squared coefficient of variation of 
zero, while the lower bound corresponds to a squared coefficient of 
variation that becomes infinite. 

We now discuss this phenomenon in more detail, because the for- 
mulas give only one way of understanding this model. Figure 17 shows 
a sample path generated from a simulation of the model, for a total 
number of five jobs in the system. In the initial part of the simulation, 
the first stage fluctuates between four and five jobs, while the second 
stage fluctuates between zero and one job; in the final part of the 
simulation, the situation is reversed; after sufficiently long time, we 
would return to the first case. When most of the jobs are at one stage, 
the mean throughput rate is roughly the reciprocal of the time to 
execute one job from start to finish, and there is no concurrency. The 
other cases, where there are multiple jobs at each stage, are transient 
and the system spends relatively little time in these states. 

The analysis developed above can make these intuitive notions more 
precise. For example, the mean fraction of time that there are zero 
jobs at the receiver is 
fraction of time zero jobs at receiver = =O . ; 


F(0) + T.. 
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Fig. 17—Sample path generated from a simulation of the model for (a) first stage vs. 
time and (b).second stage vs. time. 


566 THE BELL SYSTEM TECHNICAL JOURNAL, FEBRUARY 1983 


while the fraction of time all the jobs are at the receiver is 


1 


{es ; 
se et 


fraction of time all jobs at receiver = 1 — 


As we allow a — 0, i.e., as the fluctuations and squared coefficient of 
variation become larger, while the mean time spent at the transmitter 
and receiver stay fixed, we see that the sum of these two fractions can 
be made arbitrarily close to one, which is what the simulation result 
in Fig. 17 shows. At the same time, we see that the mean sojourn time 
in the state where the receiver is empty is given by 


mean sojourn time in idle receiver state 


Tide 
_ 





=) (1- a)* aK T rans = 
K=1 


Put differently, if one were to measure the operation of this system, 
the system might be in the receiver idle state for the entire duration 
of the observation process, and the other state of the receiver having 
all jobs (which will also become successively longer and longer as 
a — 0) will never be observed, or vice versa! In Fig. 17, this would 
correspond to gathering data in the first part of the simulation, never 
in the second part, or vice versa. 


5.4 Queueing network analysis for negligible propagation delay 


In a later section, we show that the mean throughput rate is upper 
and lower bounded by 


Alower = nt = Aupper 


1 
A ower SS pe PTO a 
ee Tesora Troe WAX Tessas, Tree) 
1 
Nupper = 


dain “- Prec + 1/2(T rans + Tyee) ; 


The mean value bounds, the queueing network upper and lower 
bounds, and exact queueing network analysis mean throughput cal- 
culations are all plotted in Figs. 18 through 20 for Ty. = 1.0 and 
Trans = 1.0, 0.5, 0.2. The queueing network bounds are identical to the 
exact analysis when the transmitter and receiver execute messages at 
the same rate. When the transmitter becomes faster than the receiver, 
the bounds and exact analysis tend to track the upper bound on mean 
throughput rate; in other words, the speed mismatch is of greater 
importance than the impact of fluctuations. 
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Fig. 18—Mean throughput rate vs. number of buffers for zero propagation time (Trans 
= 1.0 and T,.- = 1.0). 
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Fig. 19—Mean throughput rate vs. number of buffers for zero propagation time (Trans 
= 0.5 and Tye = 1.0). 


5.5 Experimental data 


To test predictions of this analysis against actual operations, a series 
of experiments were carried out to determine the mean maximum 
throughput rate of a data communications link constructed with two 
computers, one transmitting and one receiving, over a data link where 
the link propagation time was negligible compared to the data com- 
munications processing at either end of the link or the data transmis- 
sion time of a packet over this link. The test described here involved 
sending 51,200 bytes of data over a 9600-b/s data link; similar results 
were found for a 1200-b/s data link. The source data were encoded 
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Fig. 20—Mean throughput rate vs. number of buffers for zero propagation time (Trans 
= 0.2 and Tyrec = 1.0). 


into packets containing either 32, 64, 128, or 256 bytes (one byte equals 
eight bits) of data. The system and numerous other details of the 
experiment will be described elsewhere in a different report. 

We wish to test the gain in going from start-stop or single buffering 
to double buffering and to greater than double buffering; our previous 
analysis assumes that a mean value of data communications processing 
time at the transmitter and receiver adequately characterizes the 
system performance. 

The experiment involved simply measuring the time required to 
transmit 51,200 bytes of data over each link for each size packet. No 
processing was done on the data at either the transmitter or receiver 
other than to do the data communications processing required for 
correct operation. The transmitter and receiver processes resided in 
the same PDP 11/45 computer with a UNIX *-like operating system 
environment. 

Table IJ summarizes the results of that experiment. The time 
required to send each of 51,200 bytes of data plus two additional bits 
(for parity and control) over a serial 9600-b/s data link is 53.3 seconds; 
thus, the data link transmission speed and not the transmitter or 
receiver is limiting data flow here. This can also be seen directly by 
noting that the link utilization is approaching one hundred per cent in 
Table II. This table shows that double buffering at the receiver offers 
substantial improvement in mean message throughput over single 
buffering, and there is no apparent advantage in terms of throughput 
in choosing a receiver buffer larger than two (e.g., seven was tried). 
Finally, this suggests that for this purpose this level of analysis is 


* Trademark of Bell Laboratories. 
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Table II—PDP 11/45 loop-around experiment—maximum mean 
throughput rate for transmission of 51,200 bytes over 9600-b/s 


data link 
Maximum 

Number of Packet Size Throughput Link Utiliza- 
Buffers (bytes) Time (seconds) b/s) tion (percent) 

1 32 160.0 3200 33 

1 64 125.0 4096 43 

1 128 90.0 5688 59 

1 256 80.0 6400 67 

2 32 80.0 6400 67 

2 64 58.5 8752 91 

2 128 55.5 9225 96 

2 256 55.0 9309 97 

7 32 64.0 8000 83 

7 64 58.0 8827 92 

7 128 55.0 9309 97 

7 256 54.5 9412 98 


appropriate, i.e., that other phenomena that are present are in fact 
negligible for these purposes, as shown by the data. 


VI. CONCLUSIONS 


A performance study of an computer communication system may 
be carried out in at least one of three ways: 

(t) Mean value analysis, as described here 
(ii) Jackson queueing network analysis® 

(iii) Discrete event simulation model?" 

In this paper we have demonstrated the ability of the mean value 
analysis to present a clear picture of the dependence of computer 
communication system performance on the values of the model param- 
eters. The mean value analysis is a simple, flexible, inexpensive ap- 
proach to performance analysis and should always be used, even if it 
is required to supplement the analysis with one or both of the other 
techniques. The other approaches quantify the impact of fluctuations 
about mean values on performance, refining the mean value analysis. 

The utility or validity of any of these approaches cannot be judged 
in the abstract: whichever approach or combination of methods is 
most appropriate must be judged in terms of the data gathering and 
measurements, and how the data is used to draw inferences concerning 
cause and effect phenomena, coupled with the spectrum of practical 
feasible alternatives. The mean value approach presented here is 
simply one tool for carrying out this complex decision-making process. 
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APPENDIX A 
Little’s Law 


Jobs enter a system, spend time within the system, and depart. The 
system attributes of interest here are: 
(z) L(t) denotes the number of jobs in the system at time ¢ 
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(zi) C(t) denotes the number of completions in the time interval 

(0, ¢] 

(tii) Every job that enters the system leaves the system. 
Our goal is to relate the mean throughput rate of jobs, the mean time 
a job spends in the system, and the mean number of jobs in the system. 

The total mean time spent by all the jobs in the system is simply 
the area underneath the function L(é): 

t 

total mean time in system by all jobs = [ L(t)dr. 


0 


The total mean time spent in the system by any one job is given by 


| L(r)dr 
mean time in system per job in (0, ¢] = a 
We multiply and divide by ¢ as follows: 
L(r)dr 
mean time in system per job = —— x Cost’ 


The first term is simply the mean number of jobs in the system, 
averaged over a time interval of duration t: 
t 
[ L(t)dr 
a, 


mean number of jobs in system in (0, ¢] = ; 


The second term is simply the mean throughput rate: 


C(t 
mean throughput rate in (0, ¢] = <8 : 


Hence, we have shown that the mean number of jobs inside the system 
equals the mean throughput rate multiplied by the mean time in 
system per job, all over an interval of duration (0, ¢]: 


mean number of jobs in system in (0, ¢] 
= mean throughput rate in (0, ¢] X mean time in system/job in (0, ¢]. 


If the observation interval becomes infinite, t — ©, and the mean 
values defined here in fact stabilize and do not fluctuate, then we have 
what is called Little’s Law.’*”’ These other derivations rely on aver- 
aging over an ensemble of equally likely experiments, and draw on 
deep results from the theory of stochastic processes;”””* the difficulty 
is in showing that the limits in fact exist in a meaningful mathematical 
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sense. Here we focus exclusively on time averages of quantities of 
interest, since these can be readily measured. 

We close with an application of this result that we will use in the 
following section. Jobs arrive for processing by a system. Each job 
requires a total mean amount of service T. The system consists of a 
single queue feeding P identical processors. At any given instant of 
time, there are J jobs in the system, either running or waiting to run. 
The mean throughput rate of jobs is denoted by X. 

We now restrict attention to a subsystem of the total system, the 
subsystem of jobs in execution. Since we have P processors, the 
number of jobs in execution at any instant of time is min[J, P]. Hence, 
we see that the mean number of jobs in execution, averaged over a 
time interval, equals the mean throughput rate multiplied by the mean 
time a job spends in execution: 


mean number of jobs in execution = E[min(J, P)] = AT. 


The actual service pattern of the job is not of interest here: a job may 
actually consist of a series of steps with different processing at each 
step, and at the conclusion of each step of execution the job returns to 
the end of the queue (or to some point in the queue based upon the 
step) until it is completely executed. 


APPENDIX B 
Mean Value Analysis 


We now present the mathematical analysis to justify assertions in 
earlier sections. The model we deal with is a system handling only one 
type of job or transaction. Each transaction consists of one or more 
steps; at each step, a given amount of a serially reusable resource is 
required for a given time interval. A resource is any entity that is 
required for subsequent execution of a transaction; examples of phys- 
ical hardware resources are processors, memory, disk spindles, disk 
controllers, communication links, local backplane buses and so forth; 
example of logical software resources are files, tables, messages, sem- 
aphores, and so forth. Here the first step of each transaction involves 
entering the transaction into the system via an operator at a terminal; 
the second step of each transaction involves placing the transaction in 
a staging queue where it will wait if there are more than a given 
maximum number of jobs already in the system and otherwise will 
enter the system immediately; and it will go through one or more 
additional steps inside the system, where the job holds a single serially 
reusable resource for each step of execution and then moves on, until 
the job is completely executed and control returns to the operator at 
the terminal. For each step of each transaction, we are given the 
amount of each resource and the mean time required to hold that set 
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of resources. We denote by 7x the total mean time spent by a 
transaction holding resource type K, which we stress is the sum total 
execution time of all visits to that stage by a transaction. 

The mathematical model consists of 

(t) N + 2 stages of stations: station 0 is associated with operators 
at terminals, station 1 is the staging station, and stations 2, ---, N+ 
1 (N total) are associated with a single, serially reusable resource 

(ti) Stage K = 0, 2, --- , N+ 1 has Px identical parallel servers or 
processors 

(tii) A maximum of M jobs can be held at all stages K = 2, ---, 
N+1 

(tv) Each job moves from station to station, and requires 7x total 
mean amount of service time at stage K = 0, 2,---, N+ 2. 

Figure 21 is a queueing network block diagram of this system. 

We denote by A the total mean throughput rate of completing jobs; 
R denotes the total mean response time (queueing or waiting time plus 
execution time) per job. The system state space is denoted by (. 
Elements in the state space are denoted by J = (Jo, --+ , Jai). Ix, 
K = 0, 2, --- , N + 1 denotes the number of jobs either waiting or in 
execution at stage K. 

Feasible elements in the state space obey the following constraints: 

(t) The total number of tasks in the system is fixed at Po 
N+1 


Po = |d| Pee dx. 


SYSTEM 


OPERATOR STAGING 
QUEUE QUEUE 
(0) (1) 


Q 


Po 
OPERATORS 


“~N QUEUES 


MAXIMUM 





Fig. 21—Block diagram of the queueing network model. 
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(it) There can be at most a maximum of M jobs inside the system: 


N+1 
»> Jxk= min[M, Po- Jol. 


K=2 


Combining all these, we see that elements J in Q are nonnegative 
integer-valued tuples such that 


JEX= {V|V= (Vo, +--+, Vier); Vv=O0K=0,---,N+1; 


N+1 N=1 
»y Vk = Po; > Ve = min(M, Po- Vo)}. 
K=0 K=2 

The number of jobs in execution at stage K = 0, 2, --- , N+ 1 is given 


by min(Jx, Px) at any given instant of time. From the previous section, 
Little’s Law allows us to write: 


mean number in execution at stage K 


= E[min(Jx, Px)] =ATx K=0,2,---,N +1, 


where E(-) denotes the time average of the argument. Our goal is to 
find upper and lower bounds on A subject to the state space constraints 
on dx, K=0,---,N+1. 
Since mean throughput rate and mean response time or delay are 
related via 
Po 
Tot R 


we will also obtain associated lower and upper bounds on mean delay. 





> 


B.1 Lower bound on mean throughput rate 
We first divide both sides of the following equation 
E[min(Jo, Po)] = AT» 
by Po. In like manner, we divide both sides of the following equations 
ATx = E[min(dJx, Px)] K=2,---,N+1 
by min(M, Po, Px). Now we add up these N + 1 equations: 


E{min(Jo, Po)] , \\' Elmin(Jk, Px)] 
Po K=2 min(M, Po, Px) 
N+1 
Py xz=2 min(M, Po, Px) 
Now we interchange the mean value with the summation on the left- 
hand side: 
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| mae Po) | os min(Jx, Px) 
Po K=2 min(M, Po, Px) 
Tt. N+1 Te 


E py min(M, Po, Px) 


Our goal is to lower bound the left-hand side by one, which will yield 
a lower bound on A. 

Two cases can arise. First, there can exist one J = 2, --- , N+ 1 such 
that P; = J7. Since all the terms on the left-hand side are non-negative, 
we can lower bound the left-hand side by ignoring all of these terms 
except term I: 


min(eJo, Po) % N+1_ min(Jx, Pr) a min(eJ7, Pr) 
Po K=2 min(M, Po, Px)” min(M, Po, Pr) 
Pr 


>=—__—__—->]1 [=2,---,N+1. 
min(M, Po, Pr) , 


Second, for all K = 0, 2, ---, N+ 1, Px > dx and hence 


min(Jx, Px) =Jx K=2,-+--,N+1. 


Two subcases arise: if Pp — Jo = M then there is no waiting by any job 
in the staging queue, and 


Joi | Na! Jx Jo Po- Jo _ 


ae ee 1. 
Po & min(M, Po, Px) Po Po 





The other subcase is if Po — Jo > M and then there is waiting in the 
staging queue, so 
min(dJo, Po) att dx ihe Jk 


M 
a Se SS 
Po K=2 min(M, Po, Px) Py min(M, Po, Px) M 


Hence, we see that 


T N+1 Tx 
A| =—+ ¥ ————__| 2 1 
Po K=2 min(M, Po, Px) 
and we obtain the desired lower bound: 
Po 
N+1 Po 


To + enc nero ff 
o+ 2 inn P, Pa) 


Alowee = 


The total mean time to execute a job at each stage in the system has 
been stretched from Tx, K = 2,---,N+1to Tx, K=2,.--- ,N+1, 
where 
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ve Po 
=> Te > T, K=2,---,N+1 
Vx min(M, Po, Px) ~~ * 


Po 


Mower rat rs te 


To + yy Tx 
K=2 


which is one way of quantifying the slowdown at each node owing to 
congestion. 
B.2 Upper bound on mean throughput rate 
From the definition of A we see 
_ Efmin(Jx, Px)] _ min(Px, Po, M) 
Tr = Tx 
From this same identity, we obtain a second upper bound: 


ATK S E(Jx) K=0,2,---,N+1 
N+1 N+1 
ax(104 > Tr) = E(so+ »y Jn) = Pa 
K=2 


K=2 


rN K=0,2,---,N+1. 


The constraint on the maximum number of jobs inside the system can 


be written as 
N+1 
y. Jx Ss min(Po, M). 
K=2 
If we use Little’s Law, we see 
N+1 N+1 


AY Tx= Y E(dx) < min(M, Po). 
K=2 K=2 


In summary, we have shown 


ae a Px, ae Po min(M, Po) 
=09... T: 2 N+1 ; N+1 
K=0,2, N+1 K T, + y Tx y Tx 
K= K=2 


2 


B.3 Interpretation 


One intuitive explanation for these bounds is the following. To 
achieve the upper bound on mean throughput rate, each step of job 
execution has little fluctuation relative to its mean value, and jobs 
interleave with one another. The mean throughput rate can be upper 
bounded via the following mechanisms: 

(¢) The total number of jobs circulating in the system is limiting 
the mean throughput rate; in this regime, as we increase the number 
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of jobs, the mean throughput rate increases in roughly the same 
proportion 

(ti) One stage is executing jobs at its maximum rate, limiting the 
mean throughput rate; in this regime, as we increase either the speed 
of each processor at the stage, or the number of processors with the 
same speed, the mean throughput rate increases in roughly the same 
proportion 

(uz) The constraint on the maximum number of jobs in the system 
is limiting the mean throughput rate; in this regime, as we increase the 
allowable maximum number of jobs in the system, the mean through- 
put rate increases accordingly. 

To achieve the lower bound on mean throughput rate, each step of 
job execution has large fluctuations relative to its mean value, so that 
all jobs in the system are congested at one node. A different way of 
gaining insight into this lower bound is to replace the service or 
processing time distribution at each node with a bimodal distribution 
with the same mean as the old distribution, where (1 — ex) denotes 
the fraction of jobs at stage K that are executed in “zero” time and ex 
is the fraction of jobs at stage K that are executed in time 1/px such 
that Tx = €x/px. Here in normal operation two things can occur: the 
mean time for a job to cycle through the network will be roughly zero, 
since most stages will take zero time, and hence the number of jobs in 
circulation will limit the mean throughput rate, or one stage of exe- 
cution will take a time that is much longer relative to all the other 
times, and hence all but one or two jobs will be congested at one node, 
thus limiting the mean throughput rate. 


APPENDIX C 
Product Form Distribution Results 


The mathematical model considered in this section is a special case 

of that considered in the previous section: 
(t) One type of job that migrates amongst S stations or stages 
(i) A single processor available to execute a job at stage K = 

Tees 
(tit) N tasks or jobs circulate among the nodes 
(tv) Tx denotes the total mean amount of service required by a job 
summed over all its visits to stage K = 1, --- , S. 

The system state is denoted by 0: 


Ss 
a= { (dh, -++ ds) | > Je= Wh. 
K=1 


At any given instant of time, the system is in state J = (Ji, --- , ds), 
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where Jx, K = 1, --- , S denotes the number of jobs at node K (both 
waiting and in execution). The long-term time-averaged distribution 
of number of jobs at each node at an arbitrary instant of time is 
assumed from this point on to obey a so-called product form or 
separation of variables formula 


1 S 
PROB(A = Ki, +++,Js=Ks)=—-[] TH (Ki, «++, Ks) €@ 


N I=1 


Gyn = Gn(Th, +--+ , Ts) = Px It TH. 


Q J=1 


The interested reader is referred to the literature® for probabilistic 
assumptions that lead to this type of probability measure on Q, the 
admissible state space. Gy is the system partition function chosen to 
normalize the product form to a probability measure.* Granted these 
assumptions, we observe that the mean throughput rate of jobs making 
a complete cycle of the system is given by 
hie PROB(Jx>0) — Gy-i(Ti, «++ , Ts) 
TK Gn(Ti, +++, Ts) 

Our goal is to obtain tighter upper and lower bounds on mean 
throughput rate and hence mean delay than we obtained in the 
previous section, using this additional information. We begin by ob- 
serving that 


S 
s Gy-1(T1, OPS sg Ts) 2 Tx 
YS a 
A 2 Te= Grn, ---, Ts) 


is a symmetric function of the S variables T'x, i.e., we do not change 
the value of the function when we interchange any two variables. This 
property allows us to show that this function has its maximum when 
all the variables are equal to one another. This follows from calculating 
first the gradient of the function at that point and showing that it is 
zero, and second showing that the Hessian, the determinant of all 
partial second derivatives, is negative definite at that point. An alter- 
nate way of seeing this holds is to realize that the gradient is zero (from 
the symmetry of the function) at the point where all coordinates are 
unity, so this point must either be a minimum or a maximum; we then 
evaluate the function at a neighboring point, say the point where all 
coordinates except one equal zero, and see that this is Jess than at the 
point where all coordinates are one, so this must be a maximum. 
Hence, we see 
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2 SGy-1(T1 = 1, oS. ey Ts = 1) 
A Tk = 
2 = Gn(T; = 1,---, Ts = 1) 


s S+N-—2 
N-1 SN 
<= 
S+N-1\ S+N-1 
N 
We now rearrange this upper bound to see 
N 
A = igi, 0 . 
» Tx ~ (N — 1) Twerage 
K=1 


The first term in the denominator is the mean time for a job to make 
a complete cycle through the network: 


Ss 
Tycle = Y Tk. 
K=1 


The second term in the denominator is the mean amount of time per 
node spent by a job in one cycle of the network: 


1 Ss 
T: verage = 2 Tk. 
a g S > K 
The same method is now used to obtain a lower bound on the mean 
throughput rate, by observing that 


1_ Gx(Ti, +++, Ts) 
XN” GvalTi, «++, Ts)’ 


Without loss of generality, we number the nodes such that node S is 
the node that will do the greatest amount of processing on a job on the 
average during one cycle: 


Ts= max Tx. 
K=1,...,S 


This can be used to rewrite the above expression: 


1 G eee 1 

a ae (Ti, , I's ) 

A Gy-1(Ti, --+ , T's) 
Since the second term is positive, this immediately gives us an upper 
bound on the mean throughput rate: 


fees: 
Ts 
In words, node S is the bottleneck node, in this sense. 
We now rewrite our expression for the reciprocal of the mean 
throughput rate: 
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1 S-1 
-=T7Ts5t+ >} TrF(T, «++ , Ts) 
A K=1 
Gx(Tiy +++ , Ts- 
F(T,, +--+, Ts) Ss Ne ee 


Gy-i(Ti, +++, Ts) Y Tx 
K=1 


We now manipulate this expression as we did above: 


Gti = 1, eee, TS-1 = 1) 


BD $0 7) Se 
( = 6-1, T=) 


oS) (S+N— 2)! 
7 N NS — 2)! 1 
“iee ' Ca ~ (S-D(S+N-2! N 
N-1 (S — 1)!(N — 1)! 
Combining all this, we see 
1 1 S-1 
x =Tst+ wd Tk 
Rearranging, we see 
5 a <i. 
Y% Tx + (N — 1)Tmax 
K=1 


The first term in the denominator is the mean time for one job to 
make one complete cycle of the network: 


Ss 
Teycle i », Tk. 
K=1 
The second term is the maximum mean time a job spends at any one 


node in the network: 


ee = max Tr. 
K=1,..-,S 


In summary, we see 


N ' 1 N 
ee ee = A =min tT? 6. a Sp ee 
x Tk + (N = 1) Tmax ssi > Tx 2 (N a 1) T average 
K=1 K=1 
ees. eee =i = min : 
aL ejecta a (N a 6 arama = Pre fee So (NV —_ 1) f SS 


If the average time per node spent in execution by one job during a 
cycle and the maximum time per node per job are roughly comparable 
to one another, these bounds will be quite close to one another. 
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